Statistical language acquisition

Fundamental to the study of statistical language acquisition is the centuries-old debate between rationalism (or its modern manifestation in the psycholinguistic community, nativism) and empiricism, with researchers in this field falling strongly in support of the latter category.

The last factor encompasses the brain properties, learning principles, and computational efficiencies that enable children to pick up on language rapidly using patterns and strategies.

This paradigm has since become increasingly important in the study of infant speech perception, especially for input at levels higher than syllable chunks, though with some modifications, including using the listening times instead of the side preference as the relevant dependent measure.

Artificial languages allow researchers to isolate variables of interest and wield a greater degree of control over the input the subject will receive.

"[9] As such, artificial language experiments are typically conducted to explore what the relevant linguistic variables are, what sources of information infants are able to use and when, and how researchers can go about modeling the learning and acquisition process.

[11][12] This implies that simply hearing the sounds is not sufficient for language learning; social interaction cues the infant to take statistics.

UBC psychologist Janet Werker, since her influential series of experiments in the 1980s, has been one of the most prominent figures in the effort to understand the process by which human babies develop these phonological distinctions.

[5] Developing children have been found to be effective judges of linguistic authority, screening the input they model their language on by shifting their attention less to speakers who mispronounce words.

Infants were presented with two minutes of continuous speech of an artificial language from a computerized voice to remove any interference from extraneous variables such as prosody or intonation.

[18] Researchers have shown that this problem is intimately linked with the ability to parse language, and that those words that are easy to segment due to their high transitional probabilities are also easier to map to an appropriate referent.

Further studies have shown that infants quickly develop in this capacity and by seven months are capable of learning associations between moving images and nonsense words and syllables.

With impaired working memory, decision making, planning, and goal setting, which are vital functions of the Frontal Lobe, Autistic children are at loss when it comes to socializing and communication (Ozonoff, et al., 2004).

Additionally, researchers have found that the level of communicative impairment in autistic children was inversely correlated with signal increases in these same regions during exposure to artificial languages.

Based on this evidence, researchers have concluded that children with autism spectrum disorders don't have the neural architecture to identify word boundaries in continuous speech.

Recent research have investigated how infants and adults use cross-situational statistics in order to learn about not only the meanings of words but also the constraints within a context.

Smith and Yu proposed that a way to make a distinction in such ambiguous situations is to track the word-referent pairings over multiple scenes.

Models of this type allow researchers to systematically control important learning variables that are oftentimes difficult to manipulate at all in human participants.

[23] A precursor to this approach, and one of the first model types to account for the dimension of time in linguistic comprehension and production was Elman's simple recurrent network (SRN).

By making use of a feedback network to represent the system's past states, SRNs were able in a word-prediction task to cluster input into self-organized grammatical categories based solely on statistical co-occurrence patterns.

[23] Of particular importance in recent research has been the effort to understand the dynamic interaction of learning (e.g. language-based) and learner (e.g. speaker-based) variables in lexical organization and competition in bilinguals.

[25][26] SOMs have been helpful to researchers in identifying and investigating the constraints and variables of interest in a number of acquisition processes, and in exploring the consequences of these findings on linguistic and cognitive theories.

By identifying working memory as an important constraint both for language learners and for current computational models, researchers have been able to show that manipulation of this variable allows for syntactic bootstrapping, drawing not just categorical but actual content meaning from words' positional co-occurrence in sentences.

[27] Some recent models of language acquisition have centered around methods of Bayesian Inference to account for infants' abilities to appropriately parse streams of speech and acquire word meanings.

This approach has led to important results in explaining acquisition phenomena such as mutual exclusivity, one-trial learning or fast mapping, and the use of social intentions.