Performance improvement for `NB<T>`

Question

Performance improvement for `NB<T>`

Closed this issue 2 years ago · 5 comments

After seeing this benchmark I got curious if there is any room for performance improvement to Svelto.

I did some benchmark with NB<T> and got some leads for code below:

public unsafe ref T this[int index]
{
  [MethodImpl(MethodImplOptions.AggressiveInlining)] get
  {
    using (this._threadSentinel.TestThreadSafety())
      return ref Unsafe.AsRef<T>((void*) (this._ptr + index * NB<T>.SIZE));
  }
}

Using cached size NB<T>.SIZE for stride is actually slower than using Unsafe.SizeOf<T> directly. As the latter will be replaced as constant after JIT compile, while former requires static constructor check.
_threadSentinel is readonly but Sentinel is not. Calling TestThreadSafety() can cause defensive value copy.
using block essentially converts to try-finally block which makes inlining inefficient, even though sentinel does nothing on release build.

In my own benchmark (.NET 6.0) solving these will improve the getter about 2x (125ms to 60ms) faster.

Answer 1 · 2022-12-26T23:51:15.000Z

Here is the benchmark source I've tested

Results:

BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.3 (21E230) [Darwin 21.4.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.101
  [Host]     : .NET 6.0.2 (6.0.222.6406), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 6.0.2 (6.0.222.6406), Arm64 RyuJIT AdvSIMD

Method	Mean	Error	StdDev	Ratio	Allocated	Alloc Ratio
PointerAccess	57.97 ms	0.772 ms	0.722 ms	0.46	154 B	0.23
NBFixed	59.77 ms	0.258 ms	0.215 ms	0.47	75 B	0.11
NBWithSize	240.70 ms	0.737 ms	0.689 ms	1.90	893 B	1.33
NBWithSentinel	127.20 ms	0.262 ms	0.233 ms	1.00	168 B	0.25
NBWithSentinelAndSize	126.93 ms	0.148 ms	0.123 ms	1.00	670 B	1.00

Weirdly, removing using block but not replacing SIZE makes it even slower. I think it is related to static constructor for SIZE while the code got inlined.

Answer 2 · 2022-12-27T01:02:58.000Z

I've added benchmark result for .NET 7 and Mono to the link, and it seems accessing with pointer is much performant for those. I think using T* instead of Unsafe APIs should be considered alongside.

Answer 3 · 2022-12-27T15:44:03.000Z

I am pretty sure (confirmed with sharplab.io) that in release compilation the using will not generate a try catch if it's empty, in any case atm that line can be commented out. However the static field killed me, I have to think a lot about it as I use this everywhere in the code. Why is your benchmarking alocating at all?

Answer 4 · 2022-12-27T17:37:57.000Z

that in release compilation the using will not generate a try catch if it's empty

Main issue would be that using prevents inlining. I've tested with this SharpLab, and if you see the assembly it prevents inlining to CallWithUsing while method WithUsing does not contain try-finally.

Why is your benchmarking alocating at all?

It only happens if I put a very big number for Count for some reason. It happens on irrelevant benchmarks as well. I think it's MemoryDiagnoser issue.

Answer 5 · 2022-12-28T19:53:12.000Z

according my profiing Unsafe.As is as fast as using T*