Modified discrete cosine transform

This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries.

Johnson and Alan B. Bradley at the University of Surrey in 1987,[13] following earlier work by Princen and Bradley (1986)[14] to develop the MDCT's underlying principle of time-domain aliasing cancellation (TDAC), described below.

(There also exists an analogous transform, the MDST, based on the discrete sine transform, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)

In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank.

The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank.

Similar to MP3, ATRAC uses stacked quadrature mirror filters (QMF) followed by an MDCT.

The 2N real numbers x0, ..., x2N-1 are transformed into the N real numbers X0, ..., XN-1 according to the formula: (The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments.

Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible.

However, perfect invertibility is achieved by adding the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to cancel and the original data to be retrieved; this technique is known as time-domain aliasing cancellation (TDAC).

Although the direct application of the MDCT formula would require O(N2) operations, it is possible to compute the same thing with only O(N log N) complexity by recursively factorizing the computation, as in the fast Fourier transform (FFT).

One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(N) pre- and post-processing steps.

Also, as described below, any algorithm for the DCT-IV immediately provides a method to compute the MDCT and IMDCT of even size.

In typical signal-compression applications, the transform properties are further improved by using a window function wn (n = 0, ..., 2N−1) that is multiplied with xn in the MDCT and with yn in the IMDCT formulas, above, in order to avoid discontinuities at the n = 0 and 2N boundaries by making the function go smoothly to zero at those points.

A window that produces a form known as a modulated lapped transform (MLT)[15][16] is given by and is used for MP3 and MPEG-2 AAC, and for Vorbis.

As can be seen by inspection of the definitions, for even N the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by N/2 and two N-blocks of data are transformed at once.

By examining this equivalence more carefully, important properties like TDAC can be easily derived.

If we shift these to the right by N/2 (from the +N/2 term in the MDCT definition), then (b, c, d) extend past the end of the N DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above.

Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2N and shifted back to the left by N/2.

When this is extended via the boundary conditions and shifted, one obtains: Half of the IMDCT outputs are thus redundant, as b−aR = −(a−bR)R, and likewise for the last two terms.

If we group the input into bigger blocks A,B of size N, where A = (a, b) and B = (c, d), we can write this result in a simpler way: One can now understand how TDAC works.

When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply B, recovering the original data.

For odd N (which are rarely used in practice), N/2 is not an integer so the MDCT is not simply a shift permutation of a DCT-IV.

In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above.

This is the reason for using a window function that reduces the components near the boundaries of the input sequence (a, b, c, d) towards 0.

Above, the TDAC property was proved for the ordinary MDCT, showing that adding IMDCTs of subsequent blocks in their overlapping half recovers the original data.

The derivation of this inverse property for the windowed MDCT is only slightly more complicated.

Consider two overlapping consecutive sets of 2N inputs (A,B) and (B,C), for blocks A,B,C of size N. Recall from above that when

Now we suppose that we multiply both the MDCT inputs and the IMDCT outputs by a window function of length 2N.

When this is IMDCTed and multiplied again (elementwise) by the window function, the last-N half becomes: (Note that we no longer have the multiplication by 1/2, because the IMDCT normalization differs by a factor of 2 in the windowed case.)

yields, in its first-N half: When we add these two halves together, we obtain: recovering the original data.