[8] Many sounds in everyday life, including speech and music, are broadband; the frequency components spread over a wide range and there is no well-defined way to represent the signal in terms of ENVp and TFSp.
[26][27] In secondary auditory cortical fields, responses become temporally more sluggish and spectrally broader, but are still able to phase-lock to the salient features of speech and musical sounds.
[28][29][30][31] Tuning to AM rates below about 64 Hz is also found in the human auditory cortex [32][33][34][35] as revealed by brain-imaging techniques (fMRI) and cortical recordings in epileptic patients (electrocorticography).
This is consistent with neuropsychological studies of brain-damaged patients[36] and with the notion that the central auditory system performs some form of spectral decomposition of the ENVp of incoming sounds.
[37] One unexpected aspect of phase locking in the auditory cortex has been observed in the responses elicited by complex acoustic stimuli with spectrograms that exhibit relatively slow envelopes (< 20 Hz), but that are carried by fast modulations that are as high as hundreds of Hertz.
It has also been shown to be amply manifested in measurements of the spectro-temporal receptive fields of the primary auditory cortex giving them unexpectedly fine temporal accuracy and selectivity bordering on a 5-10 ms resolution.
[46] Rainstorms, crackling fire, chirping crickets or galloping horses produce "sound textures" - the collective result of many similar acoustic events - which perception is mediated by ENVn statistics.
Perception of second-order AM has been interpreted as resulting from nonlinear mechanisms in the auditory pathway that produce an audible distortion component at the envelope beat frequency in the internal modulation spectrum of the sounds.
The auditory system goes to some length to preserve this TFSn information with the presence of giant synapses (End bulbs of Held) in the ventral cochlear nucleus.
These synapses contact bushy cells (Spherical and globular) and faithfully transmit (or enhance) the temporal information present in the auditory nerve fibers to higher structures in the brainstem.
It is often assumed that many perceptual capacities rely on the ability of the monaural and binaural auditory system to encode and use TFSn cues evoked by components in sounds with frequencies below about 1–4 kHz.
There is a risk that this view of auditory processing[93] is dominated by these physical/technical concepts, similarly to how cochlear frequency-to-place mapping was for a long time conceptualized in terms of the Fourier transform.
Only at that stage does it appear that parallel pathways, potentially enhancing ENVn or TFSn information (or something akin to it), may be implemented through the temporal response characteristics of different cochlear nucleus cell types.
A computational model of the peripheral auditory system[94][95] may be used to simulate auditory-nerve fiber responses to complex sounds such as speech, and quantify the transmission (i.e., internal representation) of ENVn and TFSn cues.
In two simulation studies,[96][97] the mean-rate and spike-timing information was quantified at the output of such a model to characterize, respectively, the short-term rate of neural firing (ENVn) and the level of synchronization due to phase locking (TFSn) in response to speech sounds degraded by vocoders.
For instance, Warren & Verbrugge, demonstrated that constructed sounds of a glass bottle dropped on the floor were perceived as bouncing when high-energy regions in four different frequency bands were temporally aligned, producing amplitude peaks in the envelope.
More recent studies using vocoder simulations of cochlear implant processing demonstrated that many temporally-patterned sounds can be perceived with little original spectral information, based primarily on temporal cues.
In these studies, envelope-based acoustic measures such as number of bursts and peaks in the envelope were predictive of listeners’ abilities to identify sounds based primarily on ENVp cues.
[144][145] Sensory versus non-sensory factors for this long maturation are still debated,[146] but the results generally appear to be more dependent on the task or on sound complexity for infants and children than for adults.
[149] Psychophysical studies have suggested that degraded TFS processing due to age and hearing loss may underlie some suprathreshold deficits, such as speech perception;[10] however, debate remains about the underlying neural correlates.
[150][151] The strength of phase locking to the temporal fine structure of signals (TFSn) in quiet listening conditions remains normal in peripheral single-neuron responses following cochlear hearing loss.
[79] However, it remains unclear to which extent deficits associated with hearing loss reflect poorer TFSn processing or reduced cochlear frequency selectivity.
[182] The quality of the representation of a sound in the auditory nerve is limited by refractoriness, adaptation, saturation, and reduced synchronization (phase locking) at high frequencies, as well as by the stochastic nature of actions potentials.
Hence, despite these limiting factors, the properties of sounds are reasonably well represented in the population nerve response over a wide range of levels[185] and audio frequencies (see Volley Theory).
For instance, the ability of auditory-cortex neurons to discriminate voice-onset time cues for phonemes is degraded following moderate hearing loss (20-40 dB HL) induced by acoustic trauma.
[233] As a matter of fact, a transient hearing loss (15 days) occurring during the "critical period" is sufficient to elevate AM thresholds in adult gerbils.
[234] Even non-traumatic noise exposure reduces the phase-locking ability of cortical neurons as well as the animals' behavioral capacity to discriminate between different AM sounds.
[239] Fast and easy to administer psychophysical tests have been developed to assist clinicians in the screening of TFS-processing abilities and diagnosis of suprathreshold temporal auditory processing deficits associated with cochlear damage and ageing.
The need for such tests is corroborated by strong correlations between slow-FM or spectro-temporal modulation detection thresholds and aided speech intelligibility in competing backgrounds for hearing-impaired persons.
[261][262] A related procedure, also using envelope cross-correlations, is the short-time objective intelligibility (STOI) measure,[253] which works well for its intended application in evaluating noise suppression, but which is less accurate for nonlinear distortion.