NEW SAVANNA: Convolutional neural nets and human vision processing

Saturday, April 10, 2021

Convolutional neural nets and human vision processing

It's amazing that Alexnet is still one of the best models of human visual processing (when looking both at layerwise correspondence and total variance explained) given how much evolution in CNNs there has been since.https://t.co/B91iQXySRe
— Grace Lindsay (@neurograce) April 10, 2021

The abstract of the linked article:

Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs’ impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.

See the discussion of Levick's Law in my post, Showdown at the AI Corral, or: What kinds of mental structures are constructable by current ML/neural-net methods? [& Miriam Yevick 1975].