Vocal learning

[1] A rare trait, vocal learning is a critical substrate for spoken language and has only been detected in eight animal groups despite the wide array of vocalizing species; these include humans, bats, cetaceans, pinnipeds (seals and sea lions), elephants, and three distantly related bird groups including songbirds, parrots, and hummingbirds.

[2] Humans, on the other hand, show deeper hierarchical relationships, such as the nesting of phrases within others, and demonstrate compositional syntax, where changes in syntactic organization generate new meanings, both of which are beyond the capabilities of other vocal learning groups[3] Vocal learning phenotype also differ within groups and closely related species will not display the same abilities.

[5] Even further complicating the original binary classification is evidence from recent studies that suggests that there is greater variability in a non-learner's ability to modify vocalizations based on experience than previously thought.

Findings in suboscine passerine birds, non-human primates, mice, and goats, has led to the proposal of the vocal learning continuum hypothesis by Erich Jarvis and Gustavo Arriaga.

Hand-reared infant lesser spear-nosed bats (Phyllostomos discolor) were able to adapt their isolation calls to an external reference signal.

Two juvenile killer whales, separated from their natal pods, were seen mimicking cries of California sea lions (Zalophus californianus) that were near the region they lived in.

Bottlenose dolphins develop a distinct signature whistle in the first few months of life, which is used to identify and distinguish itself from other individuals.

This individual distinctiveness could have been a driving force for evolution by providing higher species fitness since complex communication is largely correlated with increased intelligence.

The burst-pulsed sounds, which are more complex and varied than the whistles, are often utilized to convey excitement, dominance or aggression such as when they are competing for the same piece of food.

Antagonistic vocal cries play an important role in inter-male competitions and are hypothesized to demonstrate the resource-holding potential of the emitter.

Novel vocal types expressed by dominant males spread quickly through populations of breeding elephant seals and are even imitated by juveniles in the same season.

[20] Mlaika, a ten-year-old adolescent female African elephant, has been recorded imitating truck sounds coming from the Nairobi-Mombasa highway three miles away.

[26] A cross-fostering experiment with marmosets and macaques showed convergence in pitch and other acoustic features in their supposedly innate calls,[24] demonstrating the ability, albeit limited, for vocal learning.

Mice produce long sequences of vocalizations or "songs" that are used for both isolation calls in pups when cold or removed from nest and for courtship when males sense a female or detect pheromones in their urine.

Supporting this hypothesis is the fact that many mammalian vocal learners including humans, whales, and elephants have very few major predators.

Modern birds supposedly evolved from a common ancestor around the Cretaceous-Paleogene boundary at the time of the extinction of dinosaurs, about 66 million years ago.

Phylogenetic comparisons have suggested that vocal learning evolved among birds at least two or three independent times, in songbirds, parrots, and hummingbirds.

An alternative hypothesis suggests evolution from a primate common ancestor capable of vocal learning, with the trait subsequently being lost at least eight other times.

As current evidence suggests independent evolution of these structures, the names of each equivalent vocal nucleus are different per bird group, as shown in the table below.

Vocal nuclei are found in two separate brain pathways, which will be described in songbirds as most research has been conducted in this group, yet connections are similar in parrots[35] and hummingbirds.

The RA connects to the midbrain vocal center DM (dorsal medial nucleus of the midbrain) and the brainstem (nXIIts) vocal motor neurons that control the muscles of the syrinx, a direct projection similar to the projection from LMC to the nucleus ambiguus in humans[1][37] The HVC is considered the syntax generator while the RA modulates the acoustic structure of syllables.

Area X then projects to the medial nucleus of dorsolateral thalamus (DLM), which ultimately projects back to MAN in a loop[38] The lateral part of MAN (LMAN) generates variability in song, while Area X is responsible for stereotypy, or the generation of low variability in syllable production and order after song crystallization.

Secondary pallial areas including the NCM and CM are also thought to be involved in auditory memory formation of songs used for vocal learning, but more evidence is needed to substantiate this hypothesis.

Closed-ended learners such as the zebra finch and aphantochroa hummingbird can only learn during a limited time period and subsequently produce highly stereotyped or non-variable vocalizations consisting of a single, fixed song which they repeat their entire lives.

In contrast, open-ended learners, including canaries and various parrot species, display significant plasticity and continue to learn new songs throughout the course of their lives.

Songs during this period are plastic as specific syllables begin to emerge but are frequently in the wrong sequence, errors that are similar to phonological mistakes made by young children when learning a language.

Previous research has suggested that the length of the critical period may be linked to differential gene expression within song nuclei, thought to be caused by neurotransmitter binding of receptors during neural activation.

[47] One key area is the LMAN song nucleus, part of the specialized cortical-basal-ganglia-thalamo-cortical loop in the anterior forebrain pathway, which is essential for vocal plasticity.

[51] It has been hypothesized that LMAN actively maintains RA microcircuitry in a state permissive for song plasticity and in a process of normal development it regulates HVC-RA synapses.

[54] Orthologues of FOXP2 are found in a number of vertebrates including mice and songbirds, and have been implicated in modulating plasticity of neural circuits.

Hypothetical distributions of two behavioral phenotypes: vocal learning and sensory (auditory) sequence learning. We hypothesize that the behavioral phenotypes of vocal learning and auditory learning are distributed along several categories. [ original research? ] (A) Vocal learning complexity phenotype and (B) auditory sequence learning phenotype. The left axis (blue) illustrates the hypothetical distribution of species along the behavioral phenotype dimensions. The right axis (black step functions) illustrates different types of transitions along the hypothesized vocal-learning (A) or auditory-learning (B) complexity dimensions. Whether the actual distributions are continuous functions (blue curves), will need to be tested, in relation to the alternatives that there are several categories with gradual transitions or step functions (black curves). Although auditory learning is a prerequisite for vocal learning and there can be a correlation between the two phenotypes (A–B), the two need not be interdependent. A theoretical Turing machine (Turing, 1968) is illustrated [G∗], which can outperform humans on memory for digitized auditory input but is not a vocal learner. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.
Avian phylogenetic tree and the complex-vocal learning phenotype. Shown is an avian phylogenetic tree (based on: Hackett et al., 2008). Identified in red text and ∗ are three groups of complex-vocal learning birds. Below the figure are summarized three alternative hypotheses on the evolutionary mechanisms of complex-vocal learning in birds. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.
Primate phylogenetic tree and complex-vocal learning vs. auditory sequence learning. Shown is a primate phylogenetic tree based on a combination of DNA sequence and fossil age data (Goodman et al., 1998; Page et al., 1999). Humans (Homo) are the only primates classified as “vocal learners.” However, non-human primates might be better at auditory sequence learning than their limited vocal-production learning capabilities would suggest. In blue text and (#) we highlight species for which there is some evidence of Artificial Grammar Learning capabilities for at least adjacent relationships between the elements in a sequence (tamarins: Fitch and Hauser, 2004), (macaques: Wilson et al., 2011). Presuming that the auditory capabilities of guenons and gibbons (or the symbolic learning of signs by apes) would mean that these animals are able to learn at least adjacent relationships in Artificial Grammars we can tentatively mark these species also in blue #. Note however, that for the species labeled in black text, future studies might show them to be capable of some limited-vocal learning or various levels of complexity in learning the structure of auditory sequences. Three not mutually exclusive hypotheses are illustrated for both complex-vocal learning and auditory sequence learning. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.
Vocalization subsystems in complex-vocal learners and in limited-vocal learners or vocal non-learners: Direct and indirect pathways. The different subsystems for vocalization and their interconnectivity are illustrated using different colors. (A) Schematic of a songbird brain showing some connectivity of the four major song nuclei (HVC, RA, AreaX, and LMAN). (B) Human brain schematic showing the different proposed vocal subsystems. The learned vocalization subsystem consists of a primary motor cortex pathway (blue arrow) and a cortico-striatal-thalamic loop for learning vocalizations (white). Also shown is the limbic vocal subsystem that is broadly conserved in primates for producing innate vocalizations (black), and the motoneurons that control laryngeal muscles (red). (C) Known connectivity of a brainstem vocal system (not all connections shown) showing absence of forebrain song nuclei in vocal non-learning birds. (D) Known connectivity of limited-vocal learning monkeys (based on data in squirrel monkeys and macaques) showing presence of forebrain regions for innate vocalization (ACC, OFC, and amygdala) and also of a ventral premotor area (Area 6vr) of currently poorly understood function that is indirectly connected to nucleus ambiguous. The LMC in humans is directly connected with motoneurons in the nucleus ambiguus, which orchestrate the production of learned vocalizations. Only the direct pathway through the mammalian basal ganglia (ASt, anterior striatum; GPi, globus palidus, internal) is shown as this is the one most similar to AreaX connectivity in songbirds. Modified figure based on (Jarvis, 2004; Jarvis et al., 2005). Abbreviations: ACC, anterior cingulate cortex; Am, nucleus ambiguus; Amyg, amygdala; AT, anterior thalamus; Av, nucleus avalanche; DLM, dorsolateral nucleus of the medial thalamus; DM, dorsal medial nucleus of the midbrain; HVC, high vocal center; LMAN, lateral magnocellular nucleus of the anterior nidopallium; LMC, laryngeal motor cortex; OFC, orbito-frontal cortex; PAG, periaqueductal gray; RA, robust nucleus of the arcopallium; RF, reticular formation; vPFC, ventral prefrontal cortex; VLT, ventro-lateral division of thalamus; XIIts, bird twelfth nerve nucleus. From Petkov, CI; Jarvis ED (2012). "Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates". Front. Evol. Neurosci. 4:12.