[2] According to these models, the first stage is attention free and registers low level features such as brightness gradients, motion and orientation in a parallel manner.
[3] Further experiments seemed to support this: Potter (as cited by Evans & Treisman, 2005) showed that high-order representations can be accessed rapidly from natural scenes presented at rates of up to 10 per second.
Additionally, Thorpe, Fize & Marlot (as cited by Evans & Treisman) discovered that humans and primates can categorize natural images (i.e. of animals in everyday indoor and outdoor scenes) rapidly and accurately even after brief exposures.
[4] This claim is based on a study of theirs which used attention-demanding tasks to examine participants' abilities to accurately categorize images that were filtered to have a wide range of spatial frequencies.
A recent study by Cohen, Alvarez & Nakayama (2011) calls into question the validity of evidence supporting the attention-free hypothesis.
They found that participants did display inattentional blindness while doing certain kinds of multiple-object tracking (MOT) and rapid serial visual presentation (RSVP) tasks.
In the Cohen et al. study, the MOT task involved viewing eight black moving discs presented against a changing background that consisted of randomly colored checkerboard masks.
During the 'revisiting' stage, focused attention is employed to select local objects of interest in a serial manner, and then bind their features to their representations.
This hypothesis is consistent with the results of their study in which participants were instructed to detect animal targets in RSVP sequences, and then report their identities and locations.
Ultra-rapid visual categorization is a model proposing an automatic feedforward mechanism that forms high-level object representations in parallel without focused attention.
[7] VanRullen (2006) ran simulations showing that the feedforward propagation of one wave of spikes through high-level neurons, generated in response to a stimulus, could be enough for crude recognition and categorization that occurs in 150 ms or less.