Audio time stretching and pitch scaling

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch.

On the contrary, when resampling audio to a notably higher pitch, it may be preferred to incorporate an interpolation filter, as frequencies that surpass the Nyquist frequency (determined by the sampling rate of the audio reproduction software or device) will create usually undesired sound distortions, a phenomenon that is also known as aliasing.

One way of stretching the length of a signal without affecting the pitch is to build a phase vocoder after Flanagan, Golden, and Portnoff.

Basic steps: The phase vocoder handles sinusoid components well, but early implementations introduced considerable smearing on transient ("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse.

The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.

[citation needed] Rabiner and Schafer in 1978 put forth an alternate solution that works in the time domain: attempt to find the period (or equivalently the fundamental frequency) of a given section of the wave using some pitch detection algorithm (commonly the peak of the signal's autocorrelation, or sometimes cepstral processing), and crossfade one period into another.

This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications.

It provides the most coherent results[citation needed] for single-pitched sounds like voice or musically monophonic instrument recordings.

High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the wavelet transform, or artificial neural network processing[citation needed], producing the highest-quality time stretching.

In order to preserve an audio signal's pitch when stretching or compressing its duration, many time-scale modification (TSM) procedures follow a frame-based approach.

However, simply superimposing the unmodified analysis frames typically results in undesired artifacts such as phase discontinuities or amplitude fluctuations.

[citation needed]) Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the formants into a sort of Alvin and the Chipmunks-like effect, which may be desirable or undesirable.

Pitch-corrected audio timestretch is found in every modern web browser as part of the HTML standard for media playback.