Interesting idea, but is the optimization applicable to .NET?
am11 opened this issue ยท 3 comments
The video in readme has some awesome insights for C programs. I was testing the .NET 5 JIT code gen, and based on:
https://sharplab.io/#v2:C4Lghgzgtg...
https://www.diffchecker.com/TCqd9V7K
- Unsafe variant produces worst codegen
- Safe (using BitConverter like yours) is slightly better
1/MathF.Sqrt(y)
is the best
it seems like the .NET 5's JIT optimizes inverse square root much better than Quake III implementation. Is there a way to optimize it further? (cc @EgorBo) OpenTK apparently has graphics-accelerated support for inverse (https://docs.microsoft.com/en-us/dotnet/api/opentk.functions.inversesqrtfast?view=xamarin-ios-sdk-12, but I haven't gathered its codegen).
@am11 isn't it faster to just use SSE's rsqrt?
static float HW_InverseSqrt(float x)
{
return Sse.ReciprocalSqrtScalar(Vector128.CreateScalarUnsafe(x)).ToScalar();
}
codegen:
; Method P:HW_InverseSqrt(float):float
G_M47871_IG01: ;; offset=0000H
C5F877 vzeroupper
;; bbWeight=1 PerfScore 1.00
G_M47871_IG02: ;; offset=0003H
C5FA52C0 vrsqrtss xmm0, xmm0, xmm0
;; bbWeight=1 PerfScore 11.00
G_M47871_IG03: ;; offset=0007H
C3 ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code: 8
Indeed intrinsic is much better! ๐
(supported on .NET Core 3.0 onwards)
for code portability, do you see any benefit of adding fallback chain Avx2->Avx->Sse2->Sse like this: https://sharplab.io/#v2:EYLgxg9gTgp.. (followed by a software fallback)?
Indeed intrinsic is much better! ๐
(supported on .NET Core 3.0 onwards)for code portability, do you see any benefit of adding fallback chain Avx2->Avx->Sse2->Sse like this: https://sharplab.io/#v2:EYLgxg9gTgp.. (followed by a software fallback)?
Those are the same method ๐ but you might want to cover arm64 via AdvSimd