Root Cellar

This library is an experiment in generalizing the famed "fast inverse square root" hack to other exponents. For example:

Fast cubic root
Fast inverse 4th root
More/less precise variations
Fast roots optimized for mean-square error instead of worst-case error

Fast roots are useful in performance-intensive applications where a small amount of relative error is tolerable. Modern CPUs offer intrinsic support for sqrt(y), making a fast approximation unnecessary. The other approximations in this library often outperform their standard equivalents, however, including 1/sqrt(y) and especially calls to pow.

The Algorithm

This library defines root approximations in terms of four parameters:

Root Index N (integer)
Magic Constant K (integer)
Number of Newtonian refinement steps R (unsigned integer)
Pseudo-Newtonian Constant M (floating-point, close to 1/N)

The Nth root of a floating-point value y is approximated using these steps:

Reinterpret the bits of y as an integer i.
Calculate K + i / N and reinterpret it into a floating-point x, our initial estimate.
Apply R pseudo-Newtonian refinements, improving our estimate's accuracy.
- x *= (1-M) + M * y / x^(1/p)
Return x.

For example:

float fastinvsqrt(const float y)
{
/* 1 */  union {float x; int32_t i;}; x = y;
/* 2 */  i = 0x5f32a121 - (i >> 1);
/* 3 */  x *= 1.5351f - 0.535102f * y * (x*x);
/* 4 */  return x;
}

For more information about how this works, see the References section below.

Assessing Formulas

Fast roots are conventionally optimized to minimize worst-case relative error. Applications exist, however, where it may be better to minimize mean error, mean-squared error or error when y is very close to 1.

In this library, designs are evaluated using these criteria:

Relative computation time
Root-mean-square relative error sqrt(mean: (approx_f(y) - f(y))^2 / y)
Mean relative error mean: (approx_f(y) - f(y)) / y
Worst-case relative error (approx_f(y) - f(y)) / y

Table of Constants

This table lists the best constants I've found for Nth roots with a single pseudo-Newtonian refinement. For example, N=2 corresponds to the square root while N=-2 corresponds to the inverse square root.

All error values in this table are based on the relative error (f(y) - y^(1/N)) / y^(1/N). The error for which the function was optimized is marked in bold.

32-bit floating-point roots

Error values were calculated exhaustively for these constants; I believe these are the most accurate designs for this parameterization.

N	0 Refinements	1 Refinement	2 Refinements
+2	k: `0x1fbb4f2e` m: n/a — error — max: .0347475 RMS: .0190506 mean: –.005133	k: `0x1fbed49a` m: `+.510929` — error — max: .000239058 RMS: .000148007 mean: –.0000454788	k: `0x1fbb75ad` m: `+.500122` — error — max: 1.68567e-7 RMS: 4.7967e-8 mean: –1.41369e-8
–2	k: `0x5f37642f` m: n/a — error — max: .0342129 RMS: .0244769 mean: +0.01338	k: `0x5f32a121` m: `-.535102` — error — max: .000773445 RMS: .000494072 mean: –.000025526	k: `0x5f3634f9` m: `-.501326` — error — max: 1.40452e-6 RMS: 8.95917e-7 mean: +6.21959e-7
+3	k: `0x2a510680` m: n/a — error — max: .0315547 RMS: .0180422 mean: +.00321401	k: `0x2a543aa3` m: `+.347252` — error — max: .000430098 RMS: .000237859 mean: –.0000977778	k: `0x2a4fcd03` m: `+.333818` — error — max: 6.45394e-7 RMS: 2.90881e-7 mean: –2.40255e-7
–3	k: `0x54a232a3` m: n/a — error — max: .0342405 RMS: .0195931 mean: +.0068072	k: `0x549da7bf` m: `-.364707` — error — max: .00102717 RMS: .000742809 mean: +.000277928	k: `0x54a1b99d` m: `-.334677` — error — max: 2.18458e-6 RMS: 1.04454e-6 mean: +6.17437e-7
+4	k: `0x2f9b374e` m: n/a — error — max: .0342323 RMS: .015625 mean: +.00425431	k: `0x2f9ed7c0` m: `+.266598` — error — max: .000714053 RMS: .000444122 mean: –.000195135	k: `0x2f9b8068` m: `+.250534` — error — max: 9.49041e-7 RMS: 5.28477e-7 mean: –4.76837e-07
–4	k: `0x4f58605b` m: n/a — error — max: .0312108 RMS: .020528 mean: +.00879536	k: `0x4f542107` m: `-.277446` — error — max: .00110848 RMS: .000733642 mean: +.000175304	k: `0x4f58020d` m: `-.251282` — error — max: 2.76944e-6 RMS: 1.3487e-6 mean: +5.97779e-7

See root_cellar_generated.h for reference implementations of these functions.

64-bit floating-point roots

These designs were found using a fast approximation of the maximum error. There is room to improve these constants (with extremely marginal benefit).

N	0 Refinements	1 Refinement	2 Refinements
+2	k: `0x1ff769e5b00cb024` m: n/a error max: .0347474	k: `0x1ff7da9258189b10` m: `+.51093` error max: .000238945	k: `0x1ff76e33f8e94831` m: `+.500124` error max: 3.08405e-8
–2	k: `0x5fe6ec85e7de30da` m: n/a error max: .0342128	k: `0x5fe65423e81eece9` m: `-.535103` error max: .00077328	k: `0x5fe6bbf0c11e182d` m: `-.501434` error max: 1.36764e-6
+3	k: `0x2a9f76253119d328` m: n/a error max: .0315546	k: `0x2a9fdca8d39b1833` m: `+.347251` error max: .000429969	k: `0x2a9f5317d3f76c27` m: `+.333791` error max: 4.7027e-7
–3	k: `0x553ef0ff289dd794` m: n/a error max: .0342405	k: `0x553e5fa2bf4bb94e` m: `-.364707` error max: .001027	k: `0x553eb1a359e5ec49` m: `-.335169` error max: 3.77555e-6
+4	k: `0x2ff366e9846f3cf9` m: n/a error max: .0342321	k: `0x2ff3daf850a16998` m: `+.266598` error max: .00071393	k: `0x2ff3578de1c1dc42` m: `+.250729` error max: 1.41358e-6
–4	k: `0x4feb0c0b7fa996ad` m: n/a error max: .0312107	k: `0x4fea8420dfe0c1b2` m: `-.277446` error max: .0011083	k: `0x4feaff5406bb3437` m: `-.251281` error max: 2.61417e-6

Further Notes

I decided to research fast roots for applications in signal processing and graphics rendering — and as a fun distraction from more intensive research work. I got in way over my head.

It is possible to use methods like the ones presented here to estimate any fixed exponent, such as y^(3/2) or y^8. It is also possible to create an approximate pow function with a similar mechanism, and one can be found in the fastapprox library.

When using more than one Newtonian refinement, more accurate designs may be possible by varying the pseudo-Newtonian constant used for each refinement. I have not explored this possibility as 3-dimensional configurations or higher would require more sophisticated solvers or a lot of computation time.

References

0x5f3759df (Christian Plesner Hansen) on the theory behind the hack, derivation of the "magic constant" and generalizing the approximation to exponents between -1 and 1.

Understanding Quake's Fast Inverse Square Root (BetterExplained) includes an explanation of the algebra behind the Newtonian refinement used in the hack.

Inverse Square Root (University of Waterloo) describes how we can reduce worst-case error by using a modified Newtonian refinement with a baked-in multiplier.

Improving the Accuracy of the Fast Inverse Square Root Algorithm (Walczyk et al) discusses additional techniques for improving accuracy, beyond the ones presented here.

License

Copyright 2019 Evan Balster

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

EvanBalster/root-cellar