At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction (RELP) and linear predictive coding (LPC) vocoders (e.g., FS-1015).
The CELP algorithm is based on four main ideas: The original algorithm as simulated in 1983 by Schroeder and Atal required 150 seconds to encode 1 second of speech when run on a Cray-1 supercomputer.
Since then, more efficient ways of implementing the codebooks and improvements in computing capabilities have made it possible to run the algorithm in embedded devices, such as mobile phones.
The fixed codebook is a vector quantization dictionary that is (implicitly or explicitly) hard-coded into the codec.
An all-pole filter is used because it is a good representation of the human vocal tract and because it is easy to compute.
The main principle behind CELP is called analysis-by-synthesis (AbS) and means that the encoding (analysis) is performed by perceptually optimizing the decoded (synthesis) signal in a closed loop.
This is obviously not possible in practice for two reasons: the required complexity is beyond any currently available hardware and the “best sounding” selection criterion implies a human listener.
In order to achieve real-time encoding using limited computing resources, the CELP search is broken down into smaller, more manageable, sequential searches using a simple perceptual weighting function.
Typically, the encoding is performed in the following order: Most (if not all) modern audio codecs attempt to shape the coding noise so that it appears mostly in the frequency regions where the ear cannot detect it.
For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa.