V1 Saliency Hypothesis

V1SH is the only theory so far to not only endow V1 a very important cognitive function, but also to have provided multiple non-trivial theoretical predictions that have been experimentally confirmed subsequently.

[2][3] According to V1SH, V1 creates a saliency map from retinal inputs to guide visual attention or gaze shifts.

[1] Anatomically, V1 is the gate for retinal visual inputs to enter neocortex, and is also the largest cortical area devoted to vision.

In the 1960s, David Hubel and Torsten Wiesel discovered that V1 neurons are activated by tiny image patches that are large enough to depict a small bar [4] but not a discernible face.

However, research progress to understand the subsequent processing has been much more difficult or slower than expected (by, e.g., Hubel and Wiesel[6]).

Outside the box of the traditional views, V1SH is catalyzing a change of framework[7] to enable fresh progresses on understanding vision.

A saliency map is by definition computed from, or caused by, the external visual input rather than from internal factors such as animal’s expectations or goals (e.g., to read a book).

For example, it guides our gaze shifts towards an insect flying in our peripheral visual field when we are reading a book.

This region is called the receptive field of this neuron, and typically covers no more than the size of a coin at an arm’s length.

Similarly, responses from neurons activated by their preferred colours in their receptive fields are visualized by the purple dots.

These saliency values are sent to the superior colliculus,[13] a midbrain area, to execute gaze shifts to the receptive field of the most activated neuron responding to visual input space.

This is because, at non-border texture locations, V1 neural responses to the horizontal and vertical bars (from B) are higher than those to the oblique bars (from A); these higher responses dictate and raise the saliency values at these non-border locations, making the border no longer as competitive for saliency.

It was uninfluential initially since for decades it has been believed that attentional guidance is essentially or only controlled by higher-level brain areas.

[8] Opinions started to change by a surprising piece of behavioral data: an item uniquely shown to one eye --- an ocular singleton --- among similarly appearing items shown to the other eye (using e.g. a pair of glasses for watching 3D movies) can attract gaze or attention automatically.

[36][37] Zhaoping argues that If V1SH is correct, the ideas[38][39] about how visual system works, and consequently questions to ask for future vision research, should be fundamentally changed.

Primary Visual Cortex
Primary Visual Cortex
Salience Map: represented by the map of maximum V1 neural responses to visual inputs, one maximum response per visual location
Masking of a salient border between two textures by adding a uniform texture
Masking of a salient border between two textures by adding a uniform texture
Gaze capture by an ocular singleton
Gaze capture by an ocular singleton