An article and research paper describe a fast, seemingly magical way to compute the inverse square root (1/sqrt(x)), used in the game Quake.

I'm no graphics expert, but appreciate why square roots are useful. The Pythagorean theorem computes distance between points, and dividing by distance helps normalize vectors. (Normalizing is often just a fancy term for division.)

3D games like Quake divide by distance zillions (yes zillions) of times each second, so "minor" performance improvements help immensely. We don't want to take the square root and divide the regular way: exponentiation and division are really, really expensive for the CPU.

Given these conditions, here's the magic formula to get 1/sqrt(x), as found in Quake (my comments inserted):

```
float InvSqrt(float x){
float xhalf = 0.5f * x;
int i = *(int*)&x; // store floating-point bits in integer
i = 0x5f3759df - (i >> 1); // initial guess for Newton's method
x = *(float*)&i; // convert new bits into float
x = x*(1.5f - xhalf*x*x); // One round of Newton's method
return x;
}
```

Yowza! Somehow, this code gets 1/sqrt(x) using only **multiplication** and **bit-shift operations**. There's no division or exponents involved -- how does it work?

**My Understanding:** This incredible hack *estimates* the inverse root using Newton's method of approximation, and starts with a great initial guess.

To make the guess, it takes floating-point number in scientific notation, and negates & halves the exponent to get something close the the inverse square root. It then runs a round of Newton's approximation method to further refine the estimate and tada, we've got something near the inverse square root.

Contents

## Newton's Method of Approximation

Newton's method can be used to find approximate roots of any function. You can keep iterating the method to get closer and closer to the root, but this function only uses 1 step! Here's a crash-course on Newton's method (it was new to me):

Let's say you have a function f(x) and you want to find its root (aka where f(x) = 0). Let's call your original guess "g". Newton's method gives you a way to get a new, better guess for the root:

You can keep repeating this process (plugging in your new guess into the formula) and get closer approximations for your root. Eventually you have a "new guess" that makes f(new guess) really, really close to zero -- it's a root! (Or close enough for government work, as they say).

In our case, we want the inverse square function. Let's say we have a number "i" (that's all we start with, right?) and want to find the inverse square root: 1/sqrt(i). If we make a guess "x" as the inverse root, the error between our original number and our guess "x" is:

This is because x is roughly 1/sqrt(i). If we square x we get "1/i", and if we take the inverse we should get something close to "i". If we subtract these two values, we can find our error.

Clearly, we want to make our error as small as possible. That means finding the "x" that makes error(x) = 0, which is the same as finding the root of the error equation. If we plug error(x) into Newton's approximation formula:

and take the proper derivatives:

we can plug them in to get the formula for a better guess:

Which is exactly the equation you see in the code above, remembering that x is our new guess (g) and "xhalf" is half of the original value (0.5 * i):

`x = x*(1.5f - xhalf*x*x);`

With this formula, we can start with a guess "g" and repeat the formula to get better guesses. Try this demo for using multiple iterations to find the inverse square:

If you plug in different initial guesses (.2, .4, .8) you can see how quickly the guesses converge to the real answer.

So my friends, the question becomes: "How can we make a good initial guess?"

## Making a Good Guess

What's a good guess for the inverse square root? It's a bit of a trick question -- our best guess for the inverse square root is the inverse square root itself!

Ok hotshot, you ask, how do we *actually get* 1/sqrt(x)?

This is where the magic kicks in. Let's say you have a number in exponent form or scientific notation:

Now, if you want to find the regular square root, you'd just divide the exponent by 2:

And if you want the **inverse** square root, divide the exponent by -2 to flip the sign:

So, how can we get the exponent of a number without other expensive operations?

## Floats are stored in mantissa-exponent form

Well, we're in luck. Floating-point numbers are stored by computers in mantissa-exponent form, so it's possible to extract and divide the exponent!

But instead of explicitly doing division (expensive for the CPU), the code uses another clever hack: it shifts bits. Right-shifting by one position is the same as dividing by two (you can try this for any power of 2, but it will truncate the remainder). And if you want to get a negative number, instead of multiplying by -1 (multiplications are expensive), just subtract the number from "0" (subtractions are cheap).

So, the code converts the floating-point number into an integer. It then shifts the bits by one, which means the exponent bits are divided by 2 (when we eventually turn the bits back into a float). And lastly, to negate the exponent, we subtract from the magic number 0x5f3759df. This does a few things: it preserves the mantissa (the non-exponent part, aka 5 in: 5 · 10^{6}), handles odd-even exponents, shifting bits from the exponent *into* the mantissa, and all sorts of funky stuff. The paper has more details and explanation, I didn't catch all of it the first time around. As always, feel free to comment if you have a better explanation of what's happening.

The result is that we get an initial guess that is really close to the real inverse square root! We can then do a single round of Newton's method to refine the guess. More rounds are possible (at an additional computational expense), but one round is all that's needed for the precision needed.

## So, why the magic number?

The great hack is how integers and floating-point numbers are stored. Floating-point numbers like 5.4 · 10^{6} store their exponent in a separate range of bits than "5.4". When you shift the entire number, you divide the exponent by 2, as well as dividing the number (5.4) by 2 as well. This is where the magic number comes in -- it does some cool corrections for this division, that I don't quite understand. However, there are several magic numbers that could be used -- this one happens to minimize the error in the mantissa.

The magic number also corrects for even/odd exponents; the paper mentions you can also find other magic numbers to use.

## Resources

There's further discussion on reddit (user pb_zeppelin) and slashdot:

- http://programming.reddit.com/info/t9zb/comments
- http://games.slashdot.org/article.pl?sid=06/12/01/184205 and my comment

## Leave a Reply

37 Comments on "Understanding Quake’s Fast Inverse Square Root"

For a better explanation about this, check out http://www.mceniry.net/papers/Fast%20Inverse%20Square%20Root.pdf

For a better explanation about this, check out http://www.mceniry.net/papers/Fast%20Inverse%20Square%20Root.pdf

[…] Fast inverse square root 05dec07 Beyond3D(Rys) wrote an article (almost a series!) about the history of the magic fast inverse square root found in for example the quake code. With the explantions it doesn’t seem quite that much as magic. As this pdf says, its not magic at all(page2) […]

Try asking this guy what text book he found it in. http://groups.google.com/group/comp.graphics.algorithms/msg/e314d9a118639bb9

[…] Let’s make our guess better. Archimedes discovered that adding sides made a better estimate. There are numerical methods to refine a formula again and again. For example, computers can start with a rough guess for the square root and make it better (faster than finding the closest answer from the outset). […]

[…] Quake’s fast inverse square root. […]

[…] Fast Inverse Square Root The article “fast inverse sqrt” came to my attention. It shows a small function written in C which is amazingly fast and approximates sqrt(1/x) pretty well. Appearently it was used in the Quake source code to speed up vector normalizations. But how does it work? Also, can it be improved? […]

[…] http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/ […]

[…] Understanding Quake’s Fast Inverse Square Root | BetterExplained (tags: programming algorithm algorithms math mathematics) […]

I don’t believe you ever explained what 0x5f3759d5 is

I did this in band camp.

Kalid

I really enjoyed your explanation. Much clearer than any of the links the previous commentators have posted. I have come across this equation before and not really understood it. Now I do. Thanks.

Keep up the great work.

Alex

@Tim: Check out the section “So why the magic number?” near the end.

@Alex: Thanks for the comment!

[…] [ taken with comments from Kalid Azad @ BetterExpalined.com: Understanding Quake’s Fast Inverse Square Root ] […]

Indeed, a few terms of the Newton-Raphson method is how most hardware does square root, so adapting that for an inverse square root function is similar. However, for general functions there are better ways. See the HANDBOOK OF MATHEMATICAL FUNCTIONS by Abramowitz and Stegun.

[…] [3] http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/ convert this post to pdf. Tags: [ Sem categoria ] | [ Veja este post em PDF: ] […]

Bit of a correction. Normalising is not actually a ‘fancy term for division’. A vector has an exact length of 1, that is, sqrt(x^2+y^2+z^2) = 1. The normalisation process takes a scalar, like a vector but with no definition of how long it should be, and makes it a vector of the same direction.

It’s important for things like lighting in computer graphics (you’ve heard of normal maps, right?), even phong shading wouldn’t work without it.

Thanks for putting the time and effort to give a very clear explanation. I tried to read the paper at first, but your explanation provided adequate explanation for someone who wants to get a first hand understanding that piece of code. Keep up the good work.

@saurabh: Thanks! I had trouble understanding the paper at first, writing down my thoughts helped make the ideas click a bit more :).

@saurabh: Thanks! I had trouble understanding the paper at first, writing down my thoughts helped make the ideas

[…] Reference ] Subscribe to comments Posted on 06.03.10 to Game Development by Eddie Post Tags: […]

Good article, but one thing bugged me; no CPU that I know of can do anything “zillions of times” per second. We are limited to clock cycles in the billions, not quadrillions (and especially not “zillions”), and multiple cores don’t get us there either.

Whoever wrote that code is a true gangster.

Using “Try the Demo” and plugging in values of 3, 4, 5 for n, the result blows up.

[…] is a truly awesome magic number that seems to come out of no where, 0x5f3759d5. Eco World Content From Across The Internet. Featured on EcoPressed A sand box for low […]

[…] This has been sitting in my drafts folder, waiting for me to read the article, learn about it and summarize it here. […]