This package enables a number of "unsafe" floating point optimizations for GHC. For example, the distributive law:
x*y + x*z == x*(y+z)
does not hold for Float
or Double
types. The lowest order bits may be different due to rounding errors.Therefore, GHC (and most compilers for any language) will not perform this optimization by default. Instead, most compilers support special flags that enable these unsafe optimizations. See for example the -ffast-math flag in the gcc documentation and the gcc wiki page FloatingPointMath. GHC, however, has no built in flags for these optimizations. But that's okay. GHC's RULES
pragmas are sufficiently powerful to achieve most of the performance benefits of -ffast-math
. This package provides those RULES
pragmas.
To enable these optimizations in your code, simply add the following line to the top of your source files:
import Numeric.FastMath
For most users, this is all you need to do. But some advanced code will depend on specific properties of the IEEE 754 standard, and so will want to enable only some of the optimizations. The module structure of fast-math
makes this easy. Every module corresponds to a certain family of optimizations. Importing that module enables only those optimizations. For example, to enable optimizations that are unsafe only in the presence of NaN
values, we would add the line:
import Numeric.FastMath.NaN
There are still some optimizations that gcc's -ffast-math
flag supports that this library doesn't support.This is mostly due to limitations in the way RULES
pragmas work. For example, constant folding cannot be implemented with RULES
. Instead, GHC implements this optimization as a special case in the file compiler/prelude/PrelRules.lhs.
Consider the code:
test1 :: Double -> Double
test1 d = d*10 + d*20
With the fast-math
library, GHC factors out d
, then folds the constants, producing the core:
test1 :: Double -> Double
test1 = \ (d :: Double) ->
case d of _ { D# x -> D# (*## x 30.0) }
But if we make the code just a little more complicated:
test2 d1 d2 = d1*10 + d1*d2 + d1*20
Then GHC distributes successfuly, but can't figure out how to fold the constants. It produces the core:
test2 :: Double -> Double -> Double
test2 = \ (d1 :: Double) (d2 :: Double) ->
case d1 of _ { D# x ->
case d2 of _ { D# y ->
D# (*## x (+## (+## 10.0 y) 20.0))
}
}
If we change the code so that the constants appear next to each other:
test3 d1 d2 = d1*d2 + d1*10 + d1*20
then GHC successfully combines the constants into the core:
test3 :: Double -> Double -> Double
test3 = \ (d1 :: Double) (d2 :: Double) ->
case d1 of _ { D# x ->
case d2 of _ { D# y ->
D# (*## x (+## y 30.0))
}
}
We could fix this problem if the RULES
pragmas could identify which terms are constants and which are variables. This would let us commute/associate the constants to the left of all computations, then GHC's standard constant folding mechanism would work successfully.
The best way to check what optimizations are actually supported is to look at the source code. RULES
pragmas are surprisingly readable.
The LLVM backend can perform a number of these optimizations for us as well if we pass it the right flags. It does not perform all of them, however. (Possibly GHC's optimization passes remove the opportunity?) In any event, executables from the built-in code generator and llvm generator will both see speed improvements.
Currently, there is no support for GHC 7.8's SIMD instructions. This will hopefully appear in a future release.
This package is available on hackage, and can be easily installed with:
cabal update
cabal install fast-math