[1] In this process, the visual senses influence the other parts of the somatosensory system, to result in a perceived environment that is not congruent with the actual stimuli.
Therefore, when an individual is in an environment, and multiple stimuli reach the brain at once, there is a hierarchy that vision will guide the rest of the somatosensory cues to be perceived as though they align with the visual experience, despite where their original source may be.
Research has found that the visual and auditory reflexive spatial orienting are controlled through a common underlying neural substrate.
As stimuli pass through the thalamus, there are specific regions dedicated to each sense, and therefore is able to sort out the multiple parts of an environment an individual experiences in a given moment.
[citation needed] The retina at the back of the eye is what perceives stimuli, allowing them to travel through the occipital tract to the lateral geniculate nucleus (LGN) within the thalamus.
[citation needed] The LGN is located near the medial geniculate nucleus (MGN) which is responsible for organizing auditory stimuli after one hears a specific sound.
Because these two systems are closely located to each other, research has shown that this might be where vision is responsible for taking over the perception of an environment and resulting in visual capture.
This ability to attend to a specific direction allows for a faster reaction time, despite the participant not physically shifting their visual focus during the pre-stimulus indicator.
For example, Alais and Burr (2004) using the ventriloquism effect, found that vision is capable of taking over auditory senses, specifically with well-localized visual stimuli.
[11] Another example of visual capture comes from Ehrsson, Spense, & Passingham (2004) who used a rubber hand to prove that vision is capable of determining how other senses react.
However, even though they were told to not attend to a certain box, the participant was consistently drawn to the image before the letter in all cases, resulting in a longer response time in all conditions except for the same.
The results prove that there is a consistent need for vision to dominate the other senses, and attention is immediately drawn away by it in a controlled setting.
In this situation, visual capture allows the audio stimuli to be controlled by the vision system and produce a congruent experience that the sound is coming from the puppet.
Another popular example of visual capture happens while watching a movie in a theater, and the sound appears to be coming from the actors lips.