Fast inverse square root

The algorithm is best known for its implementation in 1999 in Quake III Arena, a first-person shooter video game heavily based on 3D graphics.

With subsequent hardware advancements, especially the x86 SSE instruction rsqrtss, this algorithm is not generally the best choice for modern computers,[1] though it remains an interesting historical example.

Treating the bits again as a floating-point number, it runs one iteration of Newton's method, yielding a more precise approximation.

Ng at Berkeley wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations.

[4] In the late 1980s, Cleve Moler at Ardent Computer learned about this technique[5] and passed it along to his coworker Greg Walsh.

Greg Walsh devised the now-famous constant and fast inverse square root algorithm.

Gary Tarolli was consulting for Kubota, the company funding Ardent at the time, and likely brought the algorithm to 3dfx Interactive circa 1994.

[6][7] Jim Blinn demonstrated a simple approximation of the inverse square root in a 1997 column for IEEE Computer Graphics and Applications.

[8] Reverse engineering of other contemporary 3D video games uncovered a variation of the algorithm in Activision's 1997 Interstate '76.

[9] Quake III Arena, a first-person shooter video game, was released in 1999 by id Software and used the algorithm.

[11] Speculation arose as to who wrote the algorithm and how the constant was derived; some guessed John Carmack.

[7] Quake III's full source code was released at QuakeCon 2005, but provided no answers.

The authorship question was resolved in 2006 when Greg Walsh, the original author, contacted Beyond3D after their speculation gained popularity on Slashdot.

[6] In 2007 the algorithm was implemented in some dedicated hardware vertex shaders using field-programmable gate arrays (FPGA).

[7] This was troublesome for 3D graphics programs before the advent of specialized hardware to handle transform and lighting.

The following C code is the fast inverse square root implementation from Quake III Arena, stripped of C preprocessor directives, but including the exact original comment text:[15] At the time, the general method to compute the inverse square root was to calculate an approximation for

, then revise that approximation via another method until it came within an acceptable error range of the actual result.

Common software methods in the early 1990s drew approximations from a lookup table.

[16] The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups.

The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division.

[18] The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit floating-point word[note 1] as an integer, then subtracting it from a "magic" constant, 0x5F3759DF.

The first steps of the algorithm are illustrated below: Interpreting as IEEE 32-bit representation: Reinterpreting this last bit pattern as a floating point number gives the approximation

According to the C standard, reinterpreting a floating point value as an integer by casting then dereferencing the pointer to it is not valid (undefined behavior).

The fast inverse square root is based on this identity, and on the fact that aliasing a float32 to an integer gives a rough approximation of its logarithm.

The single graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, and InvSqrt() gives 9.982522, making the relative difference 0.0017478, or 0.175% of the true value, 10.

Chris Lomont developed a function to minimize approximation error by choosing the magic number

[31] Lomont said that the magic number for 64-bit IEEE754 size type double is 0x5FE6EC85E7DE30DA, but it was later shown by Matthew Robertson to be exactly 0x5FE6EB50C7B537A9.

[32] Jan Kadlec reduced the relative error by a further factor of 2.7 by adjusting the constants in the single Newton's method iteration as well,[33] arriving after an exhaustive search at A complete mathematical analysis for determining the magic number is now available for single-precision floating-point numbers.

In a 2009 benchmark on the Intel Core 2, this instruction took 0.85ns per float compared to 3.54ns for the fast inverse square root algorithm, and had less error.

However, manufacturers of these systems usually provide trigonometric and other math libraries, based on algorithms such as CORDIC.