Pyramid vector quantization

Pyramid vector quantization (PVQ) is a method used in audio and video codecs to quantize and transmit unit vectors, i.e. vectors whose magnitudes are known to the decoder but whose directions are unknown.

PVQ may also be used as part of a gain/shape quantization scheme, whereby the magnitude and direction of a vector are quantized separately from each other.

PVQ was initially described in 1986 in the paper "A Pyramid Vector Quantizer" by Thomas R.

[1] One caveat of PVQ is that it operates under the taxicab distance (L1-norm).

Conversion to/from the more familiar Euclidean distance (L2-norm) is possible via vector projection, though results in a less uniform distribution of quantization points (the poles of the Euclidean n-sphere become denser than non-poles).

[3] No efficient algorithm for the ideal (i.e., uniform) vector quantization of the Euclidean n-sphere is known as of 2010.

[4] This non-uniformity can be reduced by applying deformation like coordinate-wise power before projection, reducing mean-squared quantization error by ~10%.

As a form of vector quantization, PVQ defines a codebook of M quantization points, each of which is assigned an integer codeword from 0 to M−1.

The goal of the encoder is to find the codeword of the closest vector, which the decoder must decode back into a vector.

The PVQ codebook consists of all N-dimensional points

with integer-only coordinates whose absolute values sum to a constant K (i.e. whose L1-norm equals K).

As it stands, the set S tesselates the surface of an N-dimensional pyramid.

Increasing the parameter K results in more quantization points, and hence typically yields a more "accurate" approximation of the original unit vector

at the cost of larger integer codewords that require more bits to transmit.

Suppose we wish to quantize three-dimensional unit vectors using the parameter K=2.

Now, suppose we wish to transmit the unit vector <0.592, −0.720, 0.362> (rounded here to 3 decimal places, for clarity).

For example, based on the Python code below, K=5 (codebook size: 102) yields an error of only 0.097 units, and K=20 (codebook size: 1602) yields an error of only 0.042 units.