Golomb coding

Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s.

Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code,[1] making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

This makes Rice codes convenient for use on a computer, since multiplication and division by 2 can be implemented more efficiently in binary arithmetic.

Rice was motivated to propose this simpler subset due to the fact that geometric distributions are often varying with time, not precisely known, or both, so selecting the seemingly optimal code might not be very advantageous.

Golomb coding uses a tunable parameter M to divide an input value x into two parts: q, the result of a division by M, and r, the remainder.

The quotient is sent in unary coding, followed by the remainder in truncated binary encoding.

The example figure shows the position q and offset r for the encoding of integer x using Golomb–Rice parameter M = 3, with source probabilities following a geometric distribution with p(0) = 0.2.

The integer x treated by Golomb was the run length of a Bernoulli process, which has a geometric distribution starting at 0.

The best choice of parameter M is a function of the corresponding Bernoulli process, which is parameterized by

Golomb's scheme was designed to encode sequences of non-negative numbers.

However, it is easily extended to accept sequences containing negative numbers using an overlap and interleave scheme, in which all values are reassigned to some positive number in a unique and reversible way.

⁠), and the mth positive value is mapped to the m-th even number (⁠

This may be expressed mathematically as follows: a positive value x is mapped to (

Truly optimal codes for two-sided geometric distributions include multiple variants of the Golomb code, depending on the distribution parameters, including this one.

In this algorithm, if the M parameter is a power of 2, it becomes equivalent to the simpler Rice encoding: Decoding: Set M = 10.

For example, with a Rice–Golomb encoding using parameter M = 10, the decimal number 42 would first be split into q = 4 and r = 2, and would be encoded as qcode(q),rcode(r) = qcode(4),rcode(2) = 11110,010 (you don't need to encode the separating comma in the output stream, because the 0 at the end of the q code is enough to say when q ends and r begins; both the qcode and rcode are self-delimited).

Given an alphabet of two symbols, or a set of two events, P and Q, with probabilities p and (1 − p) respectively, where p ≥ 1/2, Golomb coding can be used to encode runs of zero or more P′s separated by single Q′s.

A later JPL researcher proposed various methods of optimizing or estimating the code parameter.

[3]) Consider using a Rice code with a binary portion having b bits to run-length encode sequences where P has a probability p. If

, the run-length coding approach results in compression ratios close to entropy.

When a probability distribution for integers is not known, the optimal parameter for a Golomb–Rice encoder cannot be determined.

A simpler variation of that approach is to assume that the PDF belongs to a parametrized family, estimate the PDF parameters from the data, and then compute the optimal Golomb–Rice parameter.

Assuming a generalized Gaussian PDF, which covers a wide range of statistics seen in data such as prediction errors or transform coefficients in multimedia codecs, the RLGR encoding algorithm can perform very well in such applications.

Numerous signal codecs use a Rice code for prediction residues.

In predictive algorithms, such residues tend to fall into a two-sided geometric distribution, with small residues being more frequent than large residues, and the Rice code closely approximates the Huffman code for such a distribution without the overhead of having to transmit the Huffman table.

One signal that does not match a geometric distribution is a sine wave, because the differential residues create a sinusoidal signal whose values are not creating a geometric distribution (the highest and lowest residue values have similar high frequency of occurrences, only the median positive and negative residues occur less often).

Rice coding is also used in the FELICS lossless image codec.

The Golomb–Rice coder is used in the entropy coding stage of Rice algorithm based lossless image codecs.

One such experiment yields the compression ratio graph shown.

The adaptive version of Golomb–Rice coding mentioned above, the RLGR encoder [2],is used for encoding screen content in virtual machines in the RemoteFX component of the Microsoft Remote Desktop Protocol.