Discrete cosine transform

The DCT, first proposed by Nasir Ahmed in 1972, is a widely used transformation technique in signal processing and data compression.

The DCT was first conceived by Nasir Ahmed, T. Natarajan and K. R. Rao while working at Kansas State University.

[9][1] Ahmed developed a practical DCT algorithm with his PhD students T. Raj Natarajan, Wills Dietrich, and Jeremy Fries, and his friend Dr. K. R. Rao at the University of Texas at Arlington in 1973.

[10] In 1977, Wen-Hsiung Chen published a paper with C. Harrison Smith and Stanley C. Fralick presenting a fast DCT algorithm.

[13] In 1975, John A. Roese and Guner S. Robinson adapted the DCT for inter-frame motion-compensated video coding.

They experimented with the DCT and the fast Fourier transform (FFT), developing inter-frame hybrid coders for both, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25-bit per pixel for a videotelephone scene with image quality comparable to an intra-frame coder requiring 2-bit per pixel.

[17] This led to Chen developing a practical video compression algorithm, called motion-compensated DCT or adaptive scene coding, in 1981.

[18][19] A DCT variant, the modified discrete cosine transform (MDCT), was developed by John P. Princen, A.W.

[26] Nasir Ahmed also developed a lossless DCT algorithm with Giridhar Mandyam and Neeraj Magotra at the University of New Mexico in 1995.

[7][35] The DCT, and in particular the DCT-II, is often used in signal and image processing, especially for lossy compression, because it has a strong energy compaction property.

For strongly correlated Markov processes, the DCT can approach the compaction efficiency of the Karhunen-Loève transform (which is optimal in the decorrelation sense).

[89] Due to enhancement in the hardware, software and introduction of several fast algorithms, the necessity of using MD DCTs is rapidly increasing.

DCT-IV has gained popularity for its applications in fast implementation of real-valued polyphase filtering banks,[90] lapped orthogonal transform[91][92] and cosine-modulated wavelet bases.

[93] DCT plays an important role in digital signal processing specifically data compression.

[97][98] Like any Fourier-related transform, DCTs express a function or a signal in terms of a sum of sinusoids with different frequencies and amplitudes.

However, because DCTs operate on finite, discrete sequences, two issues arise that do not apply for the continuous cosine transform.

These different boundary conditions strongly affect the applications of the transform and lead to uniquely useful properties for the various DCT types.

This makes the DCT-II matrix orthogonal, but breaks the direct correspondence with a real-even DFT of half-shifted input.

This makes the DCT-III matrix orthogonal, but breaks the direct correspondence with a real-even DFT of half-shifted output.

The four additional types of discrete cosine transform[104] correspond essentially to real-even DFTs of logically odd order, which have factors of

Multidimensional variants of the various DCT types follow straightforwardly from the one-dimensional definitions: they are simply a separable product (equivalently, a composition) of DCTs along each dimension.

Owing to the rapid growth in the applications based on the 3-D DCT, several fast algorithms are developed for the computation of 3-D DCT-II.

In order to apply the VR DIF algorithm the input data is to be formulated and rearranged as follows.

The figure to the adjacent shows the four stages that are involved in calculating 3-D DCT-II using VR DIF algorithm.

The conventional method to calculate MD-DCT-II is using a Row-Column-Frame (RCF) approach which is computationally complex and less productive on most advanced recent hardware platforms.

In addition, the RCF approach involves matrix transpose and more indexing and data swapping than the new VR algorithm.

Hence, the 3-D VR presents a good choice for reducing arithmetic operations in the calculation of the 3-D DCT-II, while keeping the simple structure that characterize butterfly-style Cooley–Tukey FFT algorithms.

The main idea of this algorithm is to use the Polynomial Transform to convert the multidimensional DCT into a series of 1-D DCTs directly.

The most efficient algorithms, in principle, are usually those that are specialized directly for the DCT, as opposed to using an ordinary FFT plus

(As a practical matter, the function-call overhead in invoking a separate FFT routine might be significant for small

Illustration of the implicit even/odd extensions of DCT input data, for N =11 data points (red dots), for the four most common types of DCT (types I-IV). Note the subtle differences at the interfaces between the data and the extensions: in DCT-II and DCT-IV both the end points are replicated in the extensions but not in DCT-I or DCT-III (and a zero point is inserted at the sign reversal extension in DCT-III).

Two-dimensional DCT frequencies from the JPEG DCT

Original size, scaled 10x (nearest neighbor), scaled 10x (bilinear).

Basis functions of the discrete cosine transformation with corresponding coefficients (specific for our image).
DCT of the image = ${\begin{bmatrix}6.1917&-0.3411&1.2418&0.1492&0.1583&0.2742&-0.0724&0.0561\\0.2205&0.0214&0.4503&0.3947&-0.7846&-0.4391&0.1001&-0.2554\\1.0423&0.2214&-1.0017&-0.2720&0.0789&-0.1952&0.2801&0.4713\\-0.2340&-0.0392&-0.2617&-0.2866&0.6351&0.3501&-0.1433&0.3550\\0.2750&0.0226&0.1229&0.2183&-0.2583&-0.0742&-0.2042&-0.5906\\0.0653&0.0428&-0.4721&-0.2905&0.4745&0.2875&-0.0284&-0.1311\\0.3169&0.0541&-0.1033&-0.0225&-0.0056&0.1017&-0.1650&-0.1500\\-0.2970&-0.0627&0.1960&0.0644&-0.1136&-0.1031&0.1887&0.1444\\\end{bmatrix}}$ .

{\displaystyle {\begin{bmatrix}6.1917&-0.3411&1.2418&0.1492&0.1583&0.2742&-0.0724&0.0561\\0.2205&0.0214&0.4503&0.3947&-0.7846&-0.4391&0.1001&-0.2554\\1.0423&0.2214&-1.0017&-0.2720&0.0789&-0.1952&0.2801&0.4713\\-0.2340&-0.0392&-0.2617&-0.2866&0.6351&0.3501&-0.1433&0.3550\\0.2750&0.0226&0.1229&0.2183&-0.2583&-0.0742&-0.2042&-0.5906\\0.0653&0.0428&-0.4721&-0.2905&0.4745&0.2875&-0.0284&-0.1311\\0.3169&0.0541&-0.1033&-0.0225&-0.0056&0.1017&-0.1650&-0.1500\\-0.2970&-0.0627&0.1960&0.0644&-0.1136&-0.1031&0.1887&0.1444\\\end{bmatrix}}} — Basis functions of the discrete cosine transformation with corresponding coefficients (specific for our image).
DCT of the image = ${\begin{bmatrix}6.1917&-0.3411&1.2418&0.1492&0.1583&0.2742&-0.0724&0.0561\\0.2205&0.0214&0.4503&0.3947&-0.7846&-0.4391&0.1001&-0.2554\\1.0423&0.2214&-1.0017&-0.2720&0.0789&-0.1952&0.2801&0.4713\\-0.2340&-0.0392&-0.2617&-0.2866&0.6351&0.3501&-0.1433&0.3550\\0.2750&0.0226&0.1229&0.2183&-0.2583&-0.0742&-0.2042&-0.5906\\0.0653&0.0428&-0.4721&-0.2905&0.4745&0.2875&-0.0284&-0.1311\\0.3169&0.0541&-0.1033&-0.0225&-0.0056&0.1017&-0.1650&-0.1500\\-0.2970&-0.0627&0.1960&0.0644&-0.1136&-0.1031&0.1887&0.1444\\\end{bmatrix}}$ .

On the left is the final image. In the middle is the weighted function (multiplied by a coefficient) which is added to the final image. On the right is the current function and corresponding coefficient. Images are scaled (using bilinear interpolation) by factor 10×.