Q (number format)

The Q notation is a way to specify the parameters of a binary fixed point number format.

For example, in Q notation, the number format denoted by Q8.8 means that the fixed point numbers in this format have 8 bits for the integer part and 8 bits for the fraction part.

The Q notation, as defined by Texas Instruments,[1] consists of the letter Q followed by a pair of numbers m.n, where m is the number of bits used for the integer part of the value, and n is the number of fraction bits.

By default, the notation describes signed binary fixed point format, with the unscaled integer being stored in two's complement format, used in most binary processors.

The first bit always gives the sign of the value(1 = negative, 0 = non-negative), and it is not counted in the m parameter.

The m and the dot may be omitted, in which case they are inferred from the size of the variable or register where the value is stored.

Thus, Q12 means a signed integer with any number of bits, that is implicitly multiplied by 2−12.

The letter U can be prefixed to the Q to denote an unsigned binary fixed-point format.

For example, UQ1.15 describes values represented as unsigned 16-bit integers with an implicit scaling factor of 2−15, which range from 0.0 to (216−1)/215 = +1.999969482421875.

In this variant, the m number includes the sign bit.

[2][3] The resolution (difference between successive values) of a Qm.n or UQm.n format is always 2−n.

The range of representable values depends on the notation used: For example, a Q15.1 format number requires 15+1 = 16 bits, has resolution 2−1 = 0.5, and the representable values range from −214 = −16384.0 to +214 − 2−1 = +16383.5.

In hexadecimal, the negative values range from 0x8000 to 0xFFFF followed by the non-negative ones from 0x0000 to 0x7FFF.

Q numbers are a ratio of two integers: the numerator is kept in storage, the denominator

The following formulas show math operations on the general Q numbers

To maintain accuracy, the intermediate multiplication and division results must be double precision and care must be taken in rounding the intermediate result before converting back to the desired Q number.

Using C the operations are (note that here, Q refers to the fractional part's number of bits) : With saturation Unlike floating point ±Inf, saturated results are not sticky and will unsaturate on adding a negative value to a positive saturated value (0x7FFF) and vice versa in that implementation shown.

In assembly language, the Signed Overflow flag can be used to avoid the typecasts needed for that C implementation.