Performance improvement for `NB<T>`
Closed this issue · 5 comments
After seeing this benchmark I got curious if there is any room for performance improvement to Svelto.
I did some benchmark with NB<T>
and got some leads for code below:
public unsafe ref T this[int index]
{
[MethodImpl(MethodImplOptions.AggressiveInlining)] get
{
using (this._threadSentinel.TestThreadSafety())
return ref Unsafe.AsRef<T>((void*) (this._ptr + index * NB<T>.SIZE));
}
}
- Using cached size
NB<T>.SIZE
for stride is actually slower than usingUnsafe.SizeOf<T>
directly. As the latter will be replaced as constant after JIT compile, while former requires static constructor check. _threadSentinel
isreadonly
butSentinel
is not. CallingTestThreadSafety()
can cause defensive value copy.using
block essentially converts totry-finally
block which makes inlining inefficient, even though sentinel does nothing on release build.
In my own benchmark (.NET 6.0) solving these will improve the getter about 2x (125ms to 60ms) faster.
Here is the benchmark source I've tested
Results:
BenchmarkDotNet=v0.13.2, OS=macOS Monterey 12.3 (21E230) [Darwin 21.4.0]
Apple M1 Pro, 1 CPU, 10 logical and 10 physical cores
.NET SDK=7.0.101
[Host] : .NET 6.0.2 (6.0.222.6406), Arm64 RyuJIT AdvSIMD
DefaultJob : .NET 6.0.2 (6.0.222.6406), Arm64 RyuJIT AdvSIMD
Method | Mean | Error | StdDev | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|
PointerAccess | 57.97 ms | 0.772 ms | 0.722 ms | 0.46 | 154 B | 0.23 |
NBFixed | 59.77 ms | 0.258 ms | 0.215 ms | 0.47 | 75 B | 0.11 |
NBWithSize | 240.70 ms | 0.737 ms | 0.689 ms | 1.90 | 893 B | 1.33 |
NBWithSentinel | 127.20 ms | 0.262 ms | 0.233 ms | 1.00 | 168 B | 0.25 |
NBWithSentinelAndSize | 126.93 ms | 0.148 ms | 0.123 ms | 1.00 | 670 B | 1.00 |
Weirdly, removing using
block but not replacing SIZE
makes it even slower. I think it is related to static constructor for SIZE
while the code got inlined.
I've added benchmark result for .NET 7 and Mono to the link, and it seems accessing with pointer is much performant for those. I think using T*
instead of Unsafe
APIs should be considered alongside.
I am pretty sure (confirmed with sharplab.io) that in release compilation the using will not generate a try catch if it's empty, in any case atm that line can be commented out. However the static field killed me, I have to think a lot about it as I use this everywhere in the code. Why is your benchmarking alocating at all?
that in release compilation the using will not generate a try catch if it's empty
Main issue would be that using
prevents inlining. I've tested with this SharpLab, and if you see the assembly it prevents inlining to CallWithUsing
while method WithUsing
does not contain try-finally.
Why is your benchmarking alocating at all?
It only happens if I put a very big number for Count
for some reason. It happens on irrelevant benchmarks as well. I think it's MemoryDiagnoser
issue.
according my profiing Unsafe.As is as fast as using T*