Interesting idea, but is the optimization applicable to .NET?

Question

Interesting idea, but is the optimization applicable to .NET?

am11 opened this issue 4 years ago · 3 comments

The video in readme has some awesome insights for C programs. I was testing the .NET 5 JIT code gen, and based on:
https://sharplab.io/#v2:C4Lghgzgtg...
https://www.diffchecker.com/TCqd9V7K

Unsafe variant produces worst codegen
Safe (using BitConverter like yours) is slightly better
1/MathF.Sqrt(y) is the best

it seems like the .NET 5's JIT optimizes inverse square root much better than Quake III implementation. Is there a way to optimize it further? (cc @EgorBo) OpenTK apparently has graphics-accelerated support for inverse (https://docs.microsoft.com/en-us/dotnet/api/opentk.functions.inversesqrtfast?view=xamarin-ios-sdk-12, but I haven't gathered its codegen).

Answer 1 · 2021-03-09T15:10:51.000Z

@am11 isn't it faster to just use SSE's rsqrt?

static float HW_InverseSqrt(float x)
{
    return Sse.ReciprocalSqrtScalar(Vector128.CreateScalarUnsafe(x)).ToScalar();
}

codegen:

; Method P:HW_InverseSqrt(float):float
G_M47871_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; bbWeight=1    PerfScore 1.00

G_M47871_IG02:              ;; offset=0003H
       C5FA52C0             vrsqrtss xmm0, xmm0, xmm0
						;; bbWeight=1    PerfScore 11.00

G_M47871_IG03:              ;; offset=0007H
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 8

Answer 2 · 2021-03-09T15:36:44.000Z

Indeed intrinsic is much better! 👍
(supported on .NET Core 3.0 onwards)

for code portability, do you see any benefit of adding fallback chain Avx2->Avx->Sse2->Sse like this: https://sharplab.io/#v2:EYLgxg9gTgp.. (followed by a software fallback)?

Answer 3 · 2021-03-09T15:38:12.000Z

Indeed intrinsic is much better! 👍
(supported on .NET Core 3.0 onwards)

for code portability, do you see any benefit of adding fallback chain Avx2->Avx->Sse2->Sse like this: https://sharplab.io/#v2:EYLgxg9gTgp.. (followed by a software fallback)?

Those are the same method 🙂 but you might want to cover arm64 via AdvSimd