/root-cellar

Toolkit for generalizing the famed fast inverse square root hack

Primary LanguageC++

Root Cellar

This library is an experiment in generalizing the famed "fast inverse square root" hack to other exponents. For example:

  • Fast cubic root
  • Fast inverse 4th root
  • More/less precise variations
  • Fast roots optimized for mean-square error instead of worst-case error

Fast roots are useful in performance-intensive applications where a small amount of relative error is tolerable. Modern CPUs offer intrinsic support for sqrt(y), making a fast approximation unnecessary. The other approximations in this library often outperform their standard equivalents, however, including 1/sqrt(y) and especially calls to pow.

The Algorithm

This library defines root approximations in terms of four parameters:

  • Root Index N (integer)
  • Magic Constant K (integer)
  • Number of Newtonian refinement steps R (unsigned integer)
  • Pseudo-Newtonian Constant M (floating-point, close to 1/N)

The Nth root of a floating-point value y is approximated using these steps:

  1. Reinterpret the bits of y as an integer i.
  2. Calculate K + i / N and reinterpret it into a floating-point x, our initial estimate.
  3. Apply R pseudo-Newtonian refinements, improving our estimate's accuracy.
    • x *= (1-M) + M * y / x^(1/p)
  4. Return x.

For example:

float fastinvsqrt(const float y)
{
/* 1 */  union {float x; int32_t i;}; x = y;
/* 2 */  i = 0x5f32a121 - (i >> 1);
/* 3 */  x *= 1.5351f - 0.535102f * y * (x*x);
/* 4 */  return x;
}

For more information about how this works, see the References section below.

Assessing Formulas

Fast roots are conventionally optimized to minimize worst-case relative error. Applications exist, however, where it may be better to minimize mean error, mean-squared error or error when y is very close to 1.

In this library, designs are evaluated using these criteria:

  • Relative computation time
  • Root-mean-square relative error sqrt(mean: (approx_f(y) - f(y))^2 / y)
  • Mean relative error mean: (approx_f(y) - f(y)) / y
  • Worst-case relative error (approx_f(y) - f(y)) / y

Table of Constants

This table lists the best constants I've found for Nth roots with a single pseudo-Newtonian refinement. For example, N=2 corresponds to the square root while N=-2 corresponds to the inverse square root.

All error values in this table are based on the relative error (f(y) - y^(1/N)) / y^(1/N). The error for which the function was optimized is marked in bold.

32-bit floating-point roots

Error values were calculated exhaustively for these constants; I believe these are the most accurate designs for this parameterization.

N 0 Refinements 1 Refinement 2 Refinements
+2 k: 0x1fbb4f2e
m: n/a
— error —
max: .0347475
RMS: .0190506
mean: –.005133
k: 0x1fbed49a
m: +.510929
— error —
max: .000239058
RMS: .000148007
mean: –.0000454788
k: 0x1fbb75ad
m: +.500122
— error —
max: 1.68567e-7
RMS: 4.7967e-8
mean: –1.41369e-8
–2 k: 0x5f37642f
m: n/a
— error —
max: .0342129
RMS: .0244769
mean: +0.01338
k: 0x5f32a121
m: -.535102
— error —
max: .000773445
RMS: .000494072
mean: –.000025526
k: 0x5f3634f9
m: -.501326
— error —
max: 1.40452e-6
RMS: 8.95917e-7
mean: +6.21959e-7
+3 k: 0x2a510680
m: n/a
— error —
max: .0315547
RMS: .0180422
mean: +.00321401
k: 0x2a543aa3
m: +.347252
— error —
max: .000430098
RMS: .000237859
mean: –.0000977778
k: 0x2a4fcd03
m: +.333818
— error —
max: 6.45394e-7
RMS: 2.90881e-7
mean: –2.40255e-7
–3 k: 0x54a232a3
m: n/a
— error —
max: .0342405
RMS: .0195931
mean: +.0068072
k: 0x549da7bf
m: -.364707
— error —
max: .00102717
RMS: .000742809
mean: +.000277928
k: 0x54a1b99d
m: -.334677
— error —
max: 2.18458e-6
RMS: 1.04454e-6
mean: +6.17437e-7
+4 k: 0x2f9b374e
m: n/a
— error —
max: .0342323
RMS: .015625
mean: +.00425431
k: 0x2f9ed7c0
m: +.266598
— error —
max: .000714053
RMS: .000444122
mean: –.000195135
k: 0x2f9b8068
m: +.250534
— error —
max: 9.49041e-7
RMS: 5.28477e-7
mean: –4.76837e-07
–4 k: 0x4f58605b
m: n/a
— error —
max: .0312108
RMS: .020528
mean: +.00879536
k: 0x4f542107
m: -.277446
— error —
max: .00110848
RMS: .000733642
mean: +.000175304
k: 0x4f58020d
m: -.251282
— error —
max: 2.76944e-6
RMS: 1.3487e-6
mean: +5.97779e-7

See root_cellar_generated.h for reference implementations of these functions.

64-bit floating-point roots

These designs were found using a fast approximation of the maximum error. There is room to improve these constants (with extremely marginal benefit).

N 0 Refinements 1 Refinement 2 Refinements
+2 k: 0x1ff769e5b00cb024
m: n/a

error max: .0347474
k: 0x1ff7da9258189b10
m: +.51093

error max: .000238945
k: 0x1ff76e33f8e94831
m: +.500124

error max: 3.08405e-8
–2 k: 0x5fe6ec85e7de30da
m: n/a

error max: .0342128
k: 0x5fe65423e81eece9
m: -.535103

error max: .00077328
k: 0x5fe6bbf0c11e182d
m: -.501434

error max: 1.36764e-6
+3 k: 0x2a9f76253119d328
m: n/a

error max: .0315546
k: 0x2a9fdca8d39b1833
m: +.347251

error max: .000429969
k: 0x2a9f5317d3f76c27
m: +.333791

error max: 4.7027e-7
–3 k: 0x553ef0ff289dd794
m: n/a

error max: .0342405
k: 0x553e5fa2bf4bb94e
m: -.364707

error max: .001027
k: 0x553eb1a359e5ec49
m: -.335169

error max: 3.77555e-6
+4 k: 0x2ff366e9846f3cf9
m: n/a

error max: .0342321
k: 0x2ff3daf850a16998
m: +.266598

error max: .00071393
k: 0x2ff3578de1c1dc42
m: +.250729

error max: 1.41358e-6
–4 k: 0x4feb0c0b7fa996ad
m: n/a

error max: .0312107
k: 0x4fea8420dfe0c1b2
m: -.277446

error max: .0011083
k: 0x4feaff5406bb3437
m: -.251281

error max: 2.61417e-6

Further Notes

I decided to research fast roots for applications in signal processing and graphics rendering — and as a fun distraction from more intensive research work. I got in way over my head.

It is possible to use methods like the ones presented here to estimate any fixed exponent, such as y^(3/2) or y^8. It is also possible to create an approximate pow function with a similar mechanism, and one can be found in the fastapprox library.

When using more than one Newtonian refinement, more accurate designs may be possible by varying the pseudo-Newtonian constant used for each refinement. I have not explored this possibility as 3-dimensional configurations or higher would require more sophisticated solvers or a lot of computation time.

References

0x5f3759df (Christian Plesner Hansen) on the theory behind the hack, derivation of the "magic constant" and generalizing the approximation to exponents between -1 and 1.

Understanding Quake's Fast Inverse Square Root (BetterExplained) includes an explanation of the algebra behind the Newtonian refinement used in the hack.

Inverse Square Root (University of Waterloo) describes how we can reduce worst-case error by using a modified Newtonian refinement with a baked-in multiplier.

Improving the Accuracy of the Fast Inverse Square Root Algorithm (Walczyk et al) discusses additional techniques for improving accuracy, beyond the ones presented here.

License

Copyright 2019 Evan Balster

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.