sebas77/Svelto.ECS

Performance improvement for `NB<T>`

Closed this issue · 5 comments

After seeing this benchmark I got curious if there is any room for performance improvement to Svelto.

I did some benchmark with NB<T> and got some leads for code below:

public unsafe ref T this[int index]
{
  [MethodImpl(MethodImplOptions.AggressiveInlining)] get
  {
    using (this._threadSentinel.TestThreadSafety())
      return ref Unsafe.AsRef<T>((void*) (this._ptr + index * NB<T>.SIZE));
  }
}
  • Using cached size NB<T>.SIZE for stride is actually slower than using Unsafe.SizeOf<T> directly. As the latter will be replaced as constant after JIT compile, while former requires static constructor check.
  • _threadSentinel is readonly but Sentinel is not. Calling TestThreadSafety() can cause defensive value copy.
  • using block essentially converts to try-finally block which makes inlining inefficient, even though sentinel does nothing on release build.

In my own benchmark (.NET 6.0) solving these will improve the getter about 2x (125ms to 60ms) faster.

Here is the benchmark source I've tested

Results:

BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.3 (21E230) [Darwin 21.4.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.101
  [Host]     : .NET 6.0.2 (6.0.222.6406), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 6.0.2 (6.0.222.6406), Arm64 RyuJIT AdvSIMD

Method Mean Error StdDev Ratio Allocated Alloc Ratio
PointerAccess 57.97 ms 0.772 ms 0.722 ms 0.46 154 B 0.23
NBFixed 59.77 ms 0.258 ms 0.215 ms 0.47 75 B 0.11
NBWithSize 240.70 ms 0.737 ms 0.689 ms 1.90 893 B 1.33
NBWithSentinel 127.20 ms 0.262 ms 0.233 ms 1.00 168 B 0.25
NBWithSentinelAndSize 126.93 ms 0.148 ms 0.123 ms 1.00 670 B 1.00

Weirdly, removing using block but not replacing SIZE makes it even slower. I think it is related to static constructor for SIZE while the code got inlined.

I've added benchmark result for .NET 7 and Mono to the link, and it seems accessing with pointer is much performant for those. I think using T* instead of Unsafe APIs should be considered alongside.

I am pretty sure (confirmed with sharplab.io) that in release compilation the using will not generate a try catch if it's empty, in any case atm that line can be commented out. However the static field killed me, I have to think a lot about it as I use this everywhere in the code. Why is your benchmarking alocating at all?

that in release compilation the using will not generate a try catch if it's empty

Main issue would be that using prevents inlining. I've tested with this SharpLab, and if you see the assembly it prevents inlining to CallWithUsing while method WithUsing does not contain try-finally.

Why is your benchmarking alocating at all?

It only happens if I put a very big number for Count for some reason. It happens on irrelevant benchmarks as well. I think it's MemoryDiagnoser issue.

according my profiing Unsafe.As is as fast as using T*