Find first set

The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm ⌊log2(x)⌋.

Both log base 2 and zero-based implementations of find first set generally return an undefined result for the zero word.

Many architectures include instructions to rapidly perform find first set and/or related operations, listed below.

A number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and/or related operations, which are frequently implemented in terms of the hardware instructions above: If bits are labeled starting at 1 (which is the convention used in this article), then count trailing zeros and find first set operations are related by ctz(x) = ffs(x) − 1 (except when the input is zero).

If bits are labeled starting at 0, then count trailing zeros and find first set are exactly equivalent operations.

On platforms with an efficient count leading zeros operation such as ARM and PowerPC, ffs can be computed by: Conversely, on machines without log2 or clz operators, clz can be computed using ctz, albeit inefficiently: On platforms with an efficient Hamming weight (population count) operation such as SPARC's POPC[35][36] or Blackfin's ONES,[37] there is: where ^ denotes bitwise exclusive-OR, | denotes bitwise OR and ~ denotes bitwise negation.

Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not all-zero (for ffs, ctz, clz) or not all-one (for ffz, clo, cto) is encountered.

Most CPUs dating from the late 1980s onward have bit operators for ffs or equivalent, but a few modern ones like some of the ARM-Mx series do not.

There are several approaches depending on architecture of the CPU and to a lesser extent, the programming language semantics and compiler code generation quality.

As mentioned in § Properties and relations, if the hardware has a clz operator, the most efficient approach to computing ctz is thus: A similar technique can take advantage of a population count instruction: An algorithm for 32-bit ctz uses de Bruijn sequences to construct a minimal perfect hash function that eliminates all branches.

An improvement on the previous looping approach examines eight bits at a time then uses a 256 (28) entry lookup table for the first non-zero byte.

An algorithm similar to de Bruijn multiplication for CTZ works for CLZ, but rather than isolating the most-significant bit, it rounds up to the nearest integer of the form 2n−1 using shifts and bitwise ORs:[46] For processors with deep pipelines, like Prescott and later Intel processors, it may be faster to replace branches by bitwise AND and OR operators (even though many more instructions are required) to avoid pipeline flushes for mispredicted branches (and these types of branches are inherently unpredictable): On platforms that provide hardware conversion of integers to floating point, the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros.

This can in turn be used to implement Newton–Raphson division, perform integer to floating point conversion in software, and other applications.

In this context, find first set (ffs) is useful in implementing the "pop" or "pull highest priority element" operation efficiently.

[55] The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem: the disks are numbered from zero, and at move k, disk number ctz(k) is moved the minimum possible distance to the right (circling back around to the left as needed).