An article and research paper describe a fast, seemingly magical way to compute the inverse square root (1/sqrt(x)), used in the game Quake.

I'm no graphics expert, but appreciate why square roots are useful. The Pythagorean theorem computes distance between points, and dividing by distance helps normalize vectors. (Normalizing is often just a fancy term for division).

3D games like Quake divide by distance zillions (yes zillions) of times each second, so "minor" performance improvements help immensely. We don't want to take the square root and divide the regular way: exponentiation and division are really, really expensive for the CPU.

Given these conditions, here's the magic formula to get 1/sqrt(x), as found in Quake (my comments inserted):

```
float InvSqrt(float x){
float xhalf = 0.5f * x;
int i = *(int*)&x; // store floating-point bits in integer
i = 0x5f3759df - (i >> 1); // initial guess for Newton's method
x = *(float*)&i; // convert new bits into float
x = x*(1.5f - xhalf*x*x); // One round of Newton's method
return x;
}
```

Yowza! Somehow, this code gets 1/sqrt(x) using only **multiplication** and **bit-shift operations**. There's no division or exponents involved -- how does it work?

**My Understanding:** This incredible hack *estimates* the inverse root using Newton's method of approximation, and starts with a great initial guess.

To make the guess, it takes floating-point number in scientific notation, and negates & halves the exponent to get something close the the inverse square root. It then runs a round of Newton's approximation method to further refine the estimate and tada, we've got something near the inverse square root.

## Newton's Method of Approximation

Newton's method can be used to find approximate roots of any function. You can keep iterating the method to get closer and closer to the root, but this function only uses 1 step! Here's a crash-course on Newton's method (it was new to me):

Let's say you have a function f(x) and you want to find its root (aka where f(x) = 0). Let's call your original guess "g". Newton's method gives you a way to get a new, better guess for the root:

You can keep repeating this process (plugging in your new guess into the formula) and get closer approximations for your root. Eventually you have a "new guess" that makes f(new guess) really, really close to zero -- it's a root! (Or close enough for government work, as they say).

In our case, we want the inverse square function. Let's say we have a number "i" (that's all we start with, right?) and want to find the inverse square root: 1/sqrt(i). If we make a guess "x" as the inverse root, the error between our original number and our guess "x" is:

This is because x is roughly 1/sqrt(i). If we square x we get "1/i", and if we take the inverse we should get something close to "i". If we subtract these two values, we can find our error.

Clearly, we want to make our error as small as possible. That means finding the "x" that makes error(x) = 0, which is the same as finding the root of the error equation. If we plug error(x) into Newton's approximation formula:

and take the proper derivatives:

we can plug them in to get the formula for a better guess:

Which is exactly the equation you see in the code above, remembering that x is our new guess (g) and "xhalf" is half of the original value (0.5 * i):

`x = x*(1.5f - xhalf*x*x);`

With this formula, we can start with a guess "g" and repeat the formula to get better guesses. Try this demo for using multiple iterations to find the inverse square:

If you plug in different initial guesses (.2, .4, .8) you can see how quickly the guesses converge to the real answer.

So my friends, the question becomes: "How can we make a good initial guess?"

## Making a Good Guess

What's a good guess for the inverse square root? It's a bit of a trick question -- our best guess for the inverse square root is the inverse square root itself!

Ok hotshot, you ask, how do we *actually get* 1/sqrt(x)?

This is where the magic kicks in. Let's say you have a number in exponent form or scientific notation:

10^6 = 1 million

Now, if you want to find the regular square root, you'd just divide the exponent by 2:

And if you want the **inverse** square root, divide the exponent by -2 to flip the sign:

So, how can we get the exponent of a number without other expensive operations?

## Floats are stored in mantissa-exponent form

Well, we're in luck. Floating-point numbers are stored by computers in mantissa-exponent form, so it's possible to extract and divide the exponent!

But instead of explicitly doing division (expensive for the CPU), the code uses another clever hack: it shifts bits. Right-shifting by one position is the same as dividing by two (you can try this for any power of 2, but it will truncate the remainder). And if you want to get a negative number, instead of multiplying by -1 (multiplications are expensive), just subtract the number from "0" (subtractions are cheap).

So, the code converts the floating-point number into an integer. It then shifts the bits by one, which means the exponent bits are divided by 2 (when we eventually turn the bits back into a float). And lastly, to negate the exponent, we subtract from the magic number 0x5f3759df. This does a few things: it preserves the mantissa (the non-exponent part, aka 5 in 5*10^6), handles odd-even exponents, shifting bits from the exponent *into* the mantissa, and all sorts of funky stuff. The paper has more details and explanation, I didn't catch all of it the first time around. As always, feel free to comment if you have a better explanation of what's happening.

The result is that we get an initial guess that is really close to the real inverse square root! We can then do a single round of Newton's method to refine the guess. More rounds are possible (at an additional computational expense), but one round is all that's needed for the precision needed.

## So, why the magic number?

The great hack is how integers and floating-point numbers are stored. Floating-point numbers like "5.4 * 10^6" store their exponent in a separate range of bits than "5.4". When you shift the entire number, you divide the exponent by 2, as well as dividing the number (5.4) by 2 as well. This is where the magic number comes in -- it does some cool corrections for this division, that I don't quite understand. However, there are several magic numbers that could be used -- this one happens to minimize the error in the mantissa.

The magic number also corrects for even/odd exponents; the paper mentions you can also find other magic numbers to use.

## Resources

There's further discussion on reddit (user pb_zeppelin) and slashdot:

For a better explanation about this, check out http://www.mceniry.net/papers/Fast%20Inverse%20Square%20Root.pdf

For a better explanation about this, check out http://www.mceniry.net/papers/Fast%20Inverse%20Square%20Root.pdf

Try asking this guy what text book he found it in. http://groups.google.com/group/comp.graphics.algorithms/msg/e314d9a118639bb9

I don’t believe you ever explained what 0x5f3759d5 is

I did this in band camp.

Kalid

I really enjoyed your explanation. Much clearer than any of the links the previous commentators have posted. I have come across this equation before and not really understood it. Now I do. Thanks.

Keep up the great work.

Alex

@Tim: Check out the section “So why the magic number?” near the end.

@Alex: Thanks for the comment!

Indeed, a few terms of the Newton-Raphson method is how most hardware does square root, so adapting that for an inverse square root function is similar. However, for general functions there are better ways. See the HANDBOOK OF MATHEMATICAL FUNCTIONS by Abramowitz and Stegun.

Bit of a correction. Normalising is not actually a ‘fancy term for division’. A vector has an exact length of 1, that is, sqrt(x^2+y^2+z^2) = 1. The normalisation process takes a scalar, like a vector but with no definition of how long it should be, and makes it a vector of the same direction.

It’s important for things like lighting in computer graphics (you’ve heard of normal maps, right?), even phong shading wouldn’t work without it.

Thanks for putting the time and effort to give a very clear explanation. I tried to read the paper at first, but your explanation provided adequate explanation for someone who wants to get a first hand understanding that piece of code. Keep up the good work.

@saurabh: Thanks! I had trouble understanding the paper at first, writing down my thoughts helped make the ideas click a bit more :).

@saurabh: Thanks! I had trouble understanding the paper at first, writing down my thoughts helped make the ideas

Good article, but one thing bugged me; no CPU that I know of can do anything “zillions of times” per second. We are limited to clock cycles in the billions, not quadrillions (and especially not “zillions”), and multiple cores don’t get us there either.

Whoever wrote that code is a true gangster.

Using “Try the Demo” and plugging in values of 3, 4, 5 for n, the result blows up.

If you shift the entire value right, why doesn’t the value stored in the left size of the value bleed into the part stored in the right side?

8-digit example in decimal:

0913|0020

Let’s say the right half is the exponent and we shift it all right by 1 column, effectively dividing it by 10 (without rounding):

0091|3002

^ Why doesn’t this happen? It’s clear that it doesn’t, but I don’t know why. I’d think we’d end up with 91 for the value with the right-most column of the value bleeding into the exponent.

Thanks Khalid,,

well explained!

I also further reffered the derivation of magic number from:

http://blog.quenta.org/2012/09/0x5f3759df.html

Come on, this blog cannot be a better explanation because

it is NOT 0x5f3759d5

IT IS 0x5f3759df

Whoops, I had the wrong constant copied in (not sure how that happened), updated now!