Variable-length quantity

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer.

A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes.

Base-128 compression is known by many names – VB (Variable Byte), VByte, Varint, VInt, EncInt etc.

[3] It is also used in the WAP environment, where it is called variable length unsigned integer or uintvar.

The DWARF debugging format[4] defines a variant called LEB128 (or ULEB128 for unsigned numbers), where the least significant group of 7 bits is encoded in the first byte, and the most significant bits are in the last byte (so effectively it is the little-endian analog of a VLQ).

Google Protocol Buffers use a similar format to have compact representation of integer values,[5] as does Oracle Portable Object Format (POF)[6] and the Microsoft .NET Framework "7-bit encoded int" in the BinaryReader and BinaryWriter classes.

The general VLQ encoding is simple, but in basic form is only defined for unsigned integers (nonnegative, positive or zero), and is somewhat redundant, since prepending 0x80 octets corresponds to zero padding.

This layout reduces CPU branches, making GVE faster than VLQ on modern pipelined CPUs.

This is notably done for Google Protocol Buffers, and is known as a zigzag encoding for signed integers.

: counting up alternates between nonnegative (starting at 0) and negative (since each step changes the least-significant bit, hence the sign), whence the name "zigzag encoding".

Here is a worked-out example for the decimal number 137: Another way to look at this is to represent the value in base-128 and then set the MSB of all but the last base-128 digit to 1.