Asymmetric numeral systems

Asymmetric numeral systems (ANS)[1][2] is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda[3] from Jagiellonian University, used in data compression since 2014[4] due to improved performance compared to previous methods.

In the tabled ANS (tANS) variant, this is achieved by constructing a finite-state machine to operate on a large alphabet without using multiplication.

The basic idea is to encode information into a single natural number

For the encoding rule, the set of natural numbers is split into disjoint subsets corresponding to different symbols – like into even and odd numbers, but with densities corresponding to the probability distribution of the symbols to encode.

There are alternative ways to apply it in practice – direct mathematical formulas for encoding and decoding steps (uABS and rANS variants), or one can put the entire behavior into a table (tANS variant).

Suppose a sequence of 1,000 zeros and ones would be encoded, which would take 1000 bits to store directly.

Using Stirling's approximation we get their asymptotic number being called Shannon entropy.

For example, ANS could be directly used to enumerate combinations: assign a different natural number to every sequence of symbols having fixed proportions in a nearly optimal way.

In contrast to encoding combinations, this probability distribution usually varies in data compressors.

For this purpose, Shannon entropy can be seen as a weighted average: a symbol of probability

We see that an equivalent method for performing the encoding is as follows: Consider a more general source with k letters, with rational probabilities

The above procedure is optimal for the uniform (symmetric) probability distribution of symbols

ANS generalize it to make it optimal for any chosen (asymmetric) probability distribution of symbols:

ranges, and split each of them in identical way into subranges of proportions given by the assumed probability distribution.

To avoid using large number arithmetic, in practice stream variants are used: which enforce

, decoder refills the least significant bits from the bitstream when needed: tANS variant puts the entire behavior (including renormalization) for

For example, a->0, b->100, c->101, d->11 prefix code would be obtained for tANS with "aaaabcdd" symbol assignment.

As for Huffman coding, modifying the probability distribution of tANS is relatively costly, hence it is mainly used in static situations, usually with some Lempel–Ziv scheme (e.g. ZSTD, LZFSE).

In contrast, rANS is usually used as a faster replacement for range coding (e.g. CRAM, LZNA, Draco,[14]).

It requires multiplication, but is more memory efficient and is appropriate for dynamically adapting probability distributions.

Encoding and decoding of ANS are performed in opposite directions, making it a stack for symbols.

For context-dependence, like Markov model, the encoder needs to use context from the perspective of later decoding.

The final state of encoding is required to start decoding, hence it needs to be stored in the compressed file.

This cost can be compensated by storing some information in the initial state of encoder.

The author of the novel ANS algorithm and its variants tANS and rANS specifically intended his work to be available freely in the public domain, for altruistic reasons.

In 2015, Google published a US and then worldwide patent for "Mixed boolean-token ans coefficient coding".

[24] At the time, Professor Duda had been asked by Google to help it with video compression, so was intimately aware of this domain, having the original author assisting them.

Duda subsequently filed a third-party application[25] to the US Patent office seeking a rejection.

[26] In June 2019 Microsoft lodged a patent application called "Features of range asymmetric number system encoding and decoding".

Yet on March 2, 2021, Microsoft gave a USPTO explanatory filing stating "The Applicant respectfully disagrees with the rejections.

Comparison of the concept of arithmetic coding (left) and ANS (right). Both can be seen as generalizations of standard numeral systems, optimal for uniform probability distribution of digits, into optimized for some chosen probability distribution. Arithmetic or range coding corresponds to adding new information in the most significant position, while ANS generalizes adding information in the least significant position. Its coding rule is " x goes to x -th appearance of subset of natural numbers corresponding to currently encoded symbol". In the presented example, sequence (01111) is encoded into a natural number 18, which is smaller than 47 obtained by using standard binary system, due to better agreement with frequencies of sequence to encode. The advantage of ANS is storing information in a single natural number, in contrast to two defining a range.
Simple example of 4 state ANS automaton for Pr( a ) = 3/4, Pr( b ) = 1/4 probability distribution. Symbol b contains −lg(1/4) = 2 bits of information and so it always produces two bits. In contrast, symbol a contains −lg(3/4) ~ 0.415 bits of information, hence sometimes it produces one bit (from state 6 and 7), sometimes 0 bits (from state 4 and 5), only increasing the state, which acts as buffer containing fractional number of bits: lg( x ). The number of states in practice is for example 2048, for 256 size alphabet (to directly encode bytes).
Example of generation of tANS tables for m = 3 size alphabet and L = 16 states, then applying them for stream decoding. First we approximate probabilities using fraction with denominator being the number of states. Then we spread these symbols in nearly uniform way, optionally the details may depend on cryptographic key for simultaneous encryption. Then we enumerate the appearances starting with value being their amount for a given symbol. Then we refill the youngests bits from the stream to return to the assumed range for x (renormalization).