GregEakin/FastSqurt

Interesting idea, but is the optimization applicable to .NET?

am11 opened this issue ยท 3 comments

am11 commented

The video in readme has some awesome insights for C programs. I was testing the .NET 5 JIT code gen, and based on:
https://sharplab.io/#v2:C4Lghgzgtg...
https://www.diffchecker.com/TCqd9V7K

  • Unsafe variant produces worst codegen
  • Safe (using BitConverter like yours) is slightly better
  • 1/MathF.Sqrt(y) is the best

it seems like the .NET 5's JIT optimizes inverse square root much better than Quake III implementation. Is there a way to optimize it further? (cc @EgorBo) OpenTK apparently has graphics-accelerated support for inverse (https://docs.microsoft.com/en-us/dotnet/api/opentk.functions.inversesqrtfast?view=xamarin-ios-sdk-12, but I haven't gathered its codegen).

@am11 isn't it faster to just use SSE's rsqrt?

static float HW_InverseSqrt(float x)
{
    return Sse.ReciprocalSqrtScalar(Vector128.CreateScalarUnsafe(x)).ToScalar();
}

codegen:

; Method P:HW_InverseSqrt(float):float
G_M47871_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; bbWeight=1    PerfScore 1.00

G_M47871_IG02:              ;; offset=0003H
       C5FA52C0             vrsqrtss xmm0, xmm0, xmm0
						;; bbWeight=1    PerfScore 11.00

G_M47871_IG03:              ;; offset=0007H
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 8
am11 commented

Indeed intrinsic is much better! ๐Ÿ‘
(supported on .NET Core 3.0 onwards)

for code portability, do you see any benefit of adding fallback chain Avx2->Avx->Sse2->Sse like this: https://sharplab.io/#v2:EYLgxg9gTgp.. (followed by a software fallback)?

Indeed intrinsic is much better! ๐Ÿ‘
(supported on .NET Core 3.0 onwards)

for code portability, do you see any benefit of adding fallback chain Avx2->Avx->Sse2->Sse like this: https://sharplab.io/#v2:EYLgxg9gTgp.. (followed by a software fallback)?

Those are the same method ๐Ÿ™‚ but you might want to cover arm64 via AdvSimd