The combination of small size and acceptable fidelity led to a boom in the distribution of music over the Internet in the late 1990s, with MP3 serving as an enabling technology at a time when bandwidth and storage were still at a premium.
[24] In 1978, Bishnu S. Atal and Manfred R. Schroeder at Bell Labs proposed an LPC speech codec, called adaptive predictive coding, that used a psychoacoustic coding-algorithm exploiting the masking properties of the human ear.
In 1985, Atal and Schroeder presented code-excited linear prediction (CELP), an LPC-based perceptual speech-coding algorithm with auditory masking that achieved a significant data compression ratio for its time.
[23] IEEE's refereed Journal on Selected Areas in Communications reported on a wide variety of (mostly perceptual) audio compression algorithms in 1988.
[36] The "Voice Coding for Communications" edition published in February 1988 reported on a wide range of established, working audio bit compression technologies,[36] some of them using auditory masking as part of their fundamental design, and several showing real-time hardware implementations.
The genesis of the MP3 technology is fully described in a paper from Professor Hans Musmann,[37] who chaired the ISO MPEG Audio group for several years.
Another predecessor of the MP3 format and technology is to be found in the perceptual codec MUSICAM based on an integer arithmetics 32 sub-bands filter bank, driven by a psychoacoustic model.
The simplicity of the corresponding decoder together with the high audio quality of this codec using for the first time a 48 kHz sampling rate, a 20 bits/sample input format (the highest available sampling standard in 1991, compatible with the AES/EBU professional digital input studio standard) were the main reasons to later adopt the characteristics of MUSICAM as the basic features for an advanced digital music compression codec.
[43] MP3 is directly descended from OCF and PXFM, representing the outcome of the collaboration of Brandenburg — working as a postdoctoral researcher at AT&T-Bell Labs with James D. Johnston ("JJ") of AT&T-Bell Labs — with the Fraunhofer Institute for Integrated Circuits, Erlangen (where he worked with Bernhard Grill and four other researchers – "The Original Six"[44]), with relatively minor contributions from the MP2 branch of psychoacoustic sub-band coders.
Brandenburg adopted the song for testing purposes, listening to it again and again each time he refined the compression algorithm, making sure it did not adversely affect the reproduction of Vega's voice.
The MUSICAM technique, proposed by Philips (Netherlands), CCETT (France), the Institute for Broadcast Technology (Germany), and Matsushita (Japan),[47] was chosen due to its simplicity and error robustness, as well as for its high level of computational efficiency.
A working group consisting of van de Kerkhof, Stoll, Leonardo Chiariglione (CSELT VP for Media), Yves-François Dehery, Karlheinz Brandenburg (Germany) and James D. Johnston (United States) took ideas from ASPEC, integrated the filter bank from Layer II, added some of their ideas such as the joint stereo coding of MUSICAM and created the MP3 format, which was designed to achieve the same quality at 128 kbit/s as MP2 at 192 kbit/s.
[50] This song was chosen because of its nearly monophonic nature and wide spectral content, making it easier to hear imperfections in the compression format during playbacks.
This particular track has an interesting property in that the two channels are almost, but not completely, the same, leading to a case where Binaural Masking Level Depression causes spatial unmasking of noise artifacts unless the encoder properly recognizes the situation and applies corrections similar to those detailed in the MPEG-2 AAC psychoacoustic model.
were taken from the EBU V3/SQAM reference compact disc and have been used by professional sound engineers to assess the subjective quality of the MPEG Audio formats.
[54] Working in non-real time on several operating systems, it was able to demonstrate the first real-time hardware decoding (DSP based) of compressed audio.
Some other real-time implementations of MPEG Audio encoders and decoders[55] were available for digital broadcasting (radio DAB, television DVB) towards consumer receivers and set-top boxes.
A hacker named SoloH discovered the source code of the "dist10" MPEG reference implementation shortly after the release on the servers of the University of Erlangen.
Later versions (2008+) support an n.nnn quality goal which automatically selects MPEG-2 or MPEG-2.5 sampling rates as appropriate for human speech recordings that need only 5512 Hz bandwidth resolution.
The popularity of MP3s began to rise rapidly with the advent of Nullsoft's audio player Winamp, released in 1997, which still had in 2023 a community of 80 million active users.
[67] In 1998, the first portable solid-state digital audio player MPMan, developed by SaeHan Information Systems, which is headquartered in Seoul, South Korea, was released and the Rio PMP300 was sold afterward in 1998, despite legal suppression efforts by the RIAA.
[70] In short, MP3 compression works by reducing the accuracy of certain components of sound that are considered (by psychoacoustic analysis) to be beyond the hearing capabilities of most humans.
Part 2 passes the sample into a 1024-point fast Fourier transform (FFT), then the psychoacoustic model is applied and another MDCT filter is performed on the output.
Part 3 quantifies and encodes each sample, known as noise allocation, which adjusts itself to meet the bit rate and sound masking requirements.
A sample of applause or a triangle instrument with a relatively low bit rate provides good examples of compression artifacts.
A detailed account of the techniques used to isolate the sounds deleted during MP3 compression, along with the conceptual motivation for the project, was published in the 2014 Proceedings of the International Computer Music Conference.
[citation needed] On earlier systems that only support the MPEG-1 Audio Layer III standard, MP3 files with a bit rate below 32 kbit/s might be played back sped-up and pitched-up.
For the general field of human speech reproduction, a bandwidth of 5,512 Hz is sufficient to produce excellent results (for voice) using the sampling rate of 11,025 and VBR encoding from 44,100 (standard) WAV file.
Older versions of LAME and FFmpeg only support integer arguments for the variable bit rate quality selection parameter.
[117] The court subsequently revoked the award, however, finding that one patent had not been infringed and that the other was not owned by Alcatel-Lucent; it was co-owned by AT&T and Fraunhofer, who had licensed it to Microsoft, the judge ruled.