Would it make sense to enable ffast-math for simd types?
Opened this issue · 7 comments
As discussed here:
-ffast-math
can be very useful to speedup floating point operations, particularly allowing easier vectorization. I'm seeing a ~30% runtime reduction for matrix multiplication in clang from doing -ffast-math
in this benchmark:
https://github.com/pedrocr/rustc-math-bench
As mentioned in the rust issue the intrinsics already allow a part of this and a wrapper type for f32/f64
can already be implemented. Since SIMD types are already aimed at vectorization and the cost of wrapping/unwrapping is already there would it make sense to enable -ffast-math
for them anyway? Alternatively if there are cases where that doesn't make sense would it be useful to duplicate slow and fast versions of all the types for convenience?
This won't be much of a response, but I personally don't know. I don't know anything about -ffast-math
or what problems it's solving. I don't know what it means to "enable -ffast-math
for them." Duplicating every single vector type seems rather extreme.
I think I'd need to see a lot more detail on this before I'd personally be comfortable doing anything.
I'm far from an expert on the topic but my understanding is that the normal IEEE754 precision guarantees don't allow doing certain arithmetic reorderings. This limits LLVM in generating more effective code even with normal floating point instructions and can severely limit it's ability to auto-vectorize. Enabling -ffast-math
for the vector types would in essence mean using things like fadd_fast instead of the normal floating point add to allow the compiler the freedom to rearrange the math. I suspect this is the right tradeoff for most SIMD applications but maybe not all. Having that be a feature in the simd
crate instead of a different type might be a cleaner option.
fadd_fast
is a compiler intrinsic, which is basically permanently unstable. I'd rather not add a dependency on such things in this crate since there is no path to stability.
I feel like there is a much larger design space. For example, instead of duplicating all of the floating point vector types, we could just expose "fast" arithmetic operations as normal functions, kind of like how we have wrapping_add
and saturating_add
on the number types today.
Another design point is to add a FFast<T>
type, kind of like our std::num::Wrapping<T>
type for wrapping arithmetic.
Finally, I'm not exactly sure why -ffast-math
belongs with SIMD. They seem like orthogonal concerns to me? For example, it seems like you'd want to be able to do fast math on normal f32
/f64
types and not just vector types. Could you please elaborate on this point?
@pedrocr My intuition here is that someone will need to champion this and propose an addition to the standard library that gives you access to -ffast-math
. That means writing an RFC and thoroughly exploring the design space.
I feel like there is a much larger design space. For example, instead of duplicating all of the floating point vector types, we could just expose "fast" arithmetic operations as normal functions, kind of like how we have
wrapping_add
andsaturating_add
on the number types today.
This would work but makes for really ugly code.
Another design point is to add a
FFast<T>
type, kind of like ourstd::num::Wrapping<T>
type for wrapping arithmetic.
This would be a much better solution indeed from the code clarity standpoint.
Finally, I'm not exactly sure why
-ffast-math
belongs with SIMD. They seem like orthogonal concerns to me? For example, it seems like you'd want to be able to do fast math on normalf32
/f64
types and not just vector types. Could you please elaborate on this point?
Doing it for normal types is indeed also useful. I see two reasons this connects with SIMD though. The first (and circunstancial) is that the vector API is already a wrapper around the underlying types so it's already naturally easier to implement these things than with f32
which is a primitive type. The wrapper solution works much poorly for primitive types because it introduces the wrapping/unwrapping steps whereast f32x4
usage already implies that anyway so code churn is minimal. The second is that SIMD auto-vectorization can work much better if -ffast-math
is enabled and so it should be simple to enable it for something like f32x4
independently of if it's easy or not to use it for f32
. Use of f32x4
implies the user is trying to go fast whereas f32
can be performance-insensitive.
My intuition here is that someone will need to champion this and propose an addition to the standard library that gives you access to
-ffast-math
. That means writing an RFC and thoroughly exploring the design space.
I've been following this issue:
I don't really understand the rust design process. Are you suggesting that the next step should be to take that discussion and try and do an RFC? I don't think I know enough about the rust conventions and this problem to write an RFC but I can try and start a pre-RFC discussion in internals to get the ball going.
The wrapper solution works much poorly for primitive types because it introduces the wrapping/unwrapping steps
Could you expand on this? let x = FFast(some_float)
and x.0
should be zero cost.
It still seems to me like -ffast-math
is orthogonal to SIMD, but that SIMD vector types might participate in it.
I don't really understand the rust design process. Are you suggesting that the next step should be to take that discussion and try and do an RFC? I don't think I know enough about the rust conventions and this problem to write an RFC but I can try and start a pre-RFC discussion in internals to get the ball going.
A pre-RFC would be good. I should have suggested that first. :-) I'd encourage you to give lots of examples.
Could you expand on this?
let x = FFast(some_float)
andx.0
should be zero cost.
It's zero cost in execution but quite costly in programming time creating quite ugly code. Here's what getting rid of OrderedFloat bought me in code simplification:
A pre-RFC would be good. I should have suggested that first. :-) I'd encourage you to give lots of examples.
Yeah, I think I'll do that. I have 2 or 3 options of how -ffast-math
could work in rust and having that discussion would be nice.
Created a pre-RFC here:
https://internals.rust-lang.org/t/pre-rfc-whats-the-best-way-to-implement-ffast-math/5740