Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch.
The scenario described by Welch's 1984 paper[1] encodes sequences of 8-bit data as fixed-length 12-bit codes.
A high-level view of the encoding algorithm is shown here: A dictionary is initialized to contain the single-character strings corresponding to all the possible input characters (and nothing else except the clear and stop codes if they're being used).
The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary.
In this way, successively longer strings are registered in the dictionary and available for subsequent encoding as single output values.
The algorithm works best on data with repeated patterns, so the initial parts of a message see little compression.
As the message grows, however, the compression ratio tends asymptotically to the maximum (i.e., the compression factor or ratio improves on an increasing curve, and not linearly, approaching a theoretical maximum inside a limited time period rather than over infinite time).
Of the graphics file formats that support LZW compression, TIFF uses early change, while GIF and most others don't.
The following example illustrates the LZW algorithm in action, showing the status of the output and the dictionary at every stage, both in encoding and decoding the data.
In real text data, repetition is generally less pronounced, so longer input streams are typically necessary before the compression builds up efficiency.
(Most flavors of LZW would put the stop code after the data alphabet, but nothing in the basic algorithm requires that.
Five-bit codes are needed to give sufficient combinations to encompass this set of 27 values.
Thus Z codes some ω that is χ + ?, and the decoder can determine the unknown character as follows: This situation occurs whenever the encoder encounters input of the form cScSc, where c is a single character, S is a string and cS is already in the dictionary, but cSc is not.
Next it sees cSc in the input (starting at the second c of cScSc) and emits the new code it just inserted.
In particular, long strings of a single character (which are common in the kinds of images LZW is often used to encode) repeatedly generate patterns of this sort.
A large English text file can typically be compressed via LZW to about half its original size.
LZW was used in the public-domain program compress, which became a more or less standard utility in Unix systems around 1986.
Various patents have been issued in the United States and other countries for LZW and similar algorithms.
[3] In 1993–94, and again in 1999, Unisys Corporation received widespread condemnation when it attempted to enforce licensing fees for LZW in GIF images.
Unisys's US patent on the LZW algorithm expired on June 20, 2003,[4] 20 years after it had been filed.