The earliest evidence for these statistical learning abilities comes from a study by Jenny Saffran, Richard Aslin, and Elissa Newport, in which 8-month-old infants were presented with nonsense streams of monotone speech.
That is, infants learn which syllables are always paired together and which ones only occur together relatively rarely, suggesting that they are parts of two different units.
[1] Although many factors play an important role, this specific mechanism is powerful and can operate over a short time scale.
[1] To determine if young children have these same abilities Saffran Aslin and Newport exposed 8-month-old infants to an artificial grammar.
[5] This finding provides stronger evidence that infants are able to pick up transitional probabilities from the speech they hear, rather than just being aware of frequencies of individual syllable sequences.
[6] Infants preferred to listen to words over part-words, whereas there was no significant difference in the nonsense frame condition.
This finding suggests that even pre-linguistic infants are able to integrate the statistical cues they learn in a laboratory into their previously acquired knowledge of a language.
A related finding indicates that slightly older infants can acquire both lexical and grammatical regularities from a single set of input,[7] suggesting that they are able to use outputs of one type of statistical learning (cues that lead to the discovery of word boundaries) as input to a second type (cues that lead to the discovery of syntactical regularities.
Real speech, though, has many different types of cues to word boundaries, including prosodic and phonotactic information.
[12] To test this idea, Maye et al. exposed 6- and 8-month-old infants to a continuum of speech sounds that varied on the degree to which they were voiced.
[1][9][17] Early evidence for this mechanism came largely from studies of computer modeling or analyses of natural language corpora.
[18][19] These early studies focused largely on distributional information specifically rather than statistical learning mechanisms generally.
Later studies expanded these results by looking at the actual behavior of children or adults who had been exposed to artificial grammars.
Evidence from a series of four experiments conducted by Gomez and Gerken suggests that children are able to generalize grammatical structures with less than two minutes of exposure to an artificial grammar.
[9][20] In the first experiment, 11-12 month-old infants were trained on an artificial grammar composed of nonsense words with a set grammatical structure.
Together these studies suggest that infants are able to extract a substantial amount of syntactic knowledge even from limited exposure to a language.
Additionally, even when the individual words of the grammar were changed, infants were still able to discriminate between grammatical and ungrammatical strings during the test phase.
[22] Both adults and children were able to pick out sentences that were ungrammatical at a rate greater than chance, even under an "incidental" exposure condition in which participants' primary goal was to complete a different task while hearing the language.
Similar work replicates the finding that learners are able to learn two sets of statistical representations when an additional cue is present (two different male voices in this case).
In a study by Yim and Rudoy[26] it was found that both monolingual and bilingual children perform statistical learning tasks equally well.
Antovich and Graf Estes[27] found that 14-month-old bilingual children are better than monolinguals at segmenting two different artificial languages using transitional probability cues.
They suggest that a bilingual environment in early childhood trains children to rely on statistical regularities to segment the speech flow and access two lexical systems.
Specifically, Yu and Smith conducted a pair of studies in which adults were exposed to pictures of objects and heard nonsense words.
Participants were able to choose the correct item more often than would happen by chance, indicating, according to the authors, that they were using statistical learning mechanisms to track co-occurrence probabilities across training trials.
[29][30] Medina et al. and Trueswell et al. argue that, because Yu and Smith only tracked knowledge at the end of the training, rather than tracking knowledge on a trial-by-trial basis, it is impossible to know if participants were truly updating statistical probabilities of co-occurrence (and therefore maintaining multiple hypotheses simultaneously), or if, instead, they were forming a single hypothesis and checking it on the next trial.
To distinguish between these two possibilities, Trueswell et al. conducted a series of experiments similar to those conducted by Yu and Smith except that participants were asked to indicate their choice of the word-referent mapping on each trial, and only a single object name was presented on each trial (with varying numbers of objects).
These results suggest that participants did not remember the surrounding context of individual presentations and were therefore not using statistical cues to determine the word-referent mappings.
[31] Additionally, statistical learning by itself cannot account even for those aspects of language acquisition for which it has been shown to play a large role.
In demonstrating a preference for the novel sequences (which violated the transitional probability that defined the grouping of the original stimuli) the results of the study support the likelihood of domain general statistical learning in infancy.