risk-of-thunder/R2API

The reflection extension method using delegate is slower than regular reflection

DemoJameson opened this issue · 16 comments

I think the Delegate are defined in Reflection.cs should be made public so that user can cached the delegate instances to avoid the time consuming for finding it
you can check the code: https://github.com/DemoJameson/R2API/blob/benchmark/R2API.Benchmark/Benchmark.cs#L96-L100

!!!! Benchmark for Field Access
||Initialization|| ||
|Init info:                          |     85 ms|

||Executing for 2000000 iterations|| ||
|Direct set:                         |      7 ms|
|Direct get:                         |      2 ms|
|dynamic set:                        |     17 ms|
|dynamic get:                        |     20 ms|
|R2API Delegate set:                 |    359 ms|
|R2API Delegate get:                 |    339 ms|
|R2API Cached Delegate set:          |     10 ms|
|R2API Cached Delegate get:          |      7 ms|
|Reflection set:                     |    148 ms|
|Reflection get:                     |    113 ms|
|Cached Reflection set:              |     91 ms|
|Cached Reflection get:              |     59 ms|

image

This benchmark is flawed, you're including the time to actually generate the delegates in the times for R2API Delegate Get and R2API Delegate Set

 
 ​                {"​R2API Delegate set​", () ​=>​ ​TargetPerson​.SetFieldValue​("​name​", ​"​John​")}, 
 ​                {"​R2API Delegate get​", () ​=>​ ​TargetPerson​.GetFieldValue​<string​>("​name​")},

FieldInfo.GetFieldValue and FieldInfo.SetFieldValue are already cached by R2API, the extra time you're seeing is because you don't allow it to generate the delegate beforehand like you do for your "Cached R2API Delegate" versions.

To see what I mean, add this above your actionMap declaration

             TargetPerson.GetFieldValue<string>("name");
             TargetPerson.SetFieldValue("name", "John");
 ​            ​var​ ​actionMap​ ​=​ ​new​ ​Dictionary​<​string​, ​Action​> {

Then rerun your benchmark

https://github.com/DemoJameson/R2API/blob/benchmark/R2API.Benchmark/Benchmark.cs#L78-L79
here is the new result, not much different

||Executing for 2000000 iterations|| ||
|Direct set:                         |      6 ms|
|Direct get:                         |      3 ms|
|dynamic set:                        |     19 ms|
|dynamic get:                        |     21 ms|
|R2API Delegate set:                 |    336 ms|
|R2API Delegate get:                 |    334 ms|
|R2API Cached Delegate set:          |     10 ms|
|R2API Cached Delegate get:          |      7 ms|
|Reflection set:                     |    148 ms|
|Reflection get:                     |    118 ms|
|Cached Reflection set:              |     94 ms|
|Cached Reflection get:              |     58 ms|

after using Dictionary instead of ConcurrentDictionary

||Executing for 2000000 iterations|| ||
|Direct set:                         |      7 ms|
|Direct get:                         |      3 ms|
|dynamic set:                        |     19 ms|
|dynamic get:                        |     22 ms|
|R2API Delegate set:                 |    347 ms|
|R2API Delegate get:                 |    342 ms|
|R2API Cached Delegate set:          |     10 ms|
|R2API Cached Delegate get:          |      8 ms|
|Reflection set:                     |    147 ms|
|Reflection get:                     |    113 ms|
|Cached Reflection set:              |     93 ms|
|Cached Reflection get:              |     60 ms|

Hi! Reworked your benchmarks to use Benchmark.NET and got some interesting results for sure, of which we're discussing in the discord. Added a test of only using the delegate cache and it seems to still add a good bit of overhead. We will be exposing the delegates for direct use. However, R2API reflection still seems to be a lot faster. What runtime did you test this on?

|                   Method |          Mean |      Error |     StdDev |        Median |
|------------------------- |--------------:|-----------:|-----------:|--------------:|
| R2APIGetFieldGetDelegate |    24.1583 ns |  0.0626 ns |  0.0555 ns |    24.1416 ns |
| R2APIGetFieldSetDelegate |    24.2390 ns |  0.0348 ns |  0.0308 ns |    24.2321 ns |
|         R2APIDelegateSet |   112.5912 ns |  0.0632 ns |  0.0528 ns |   112.6049 ns |
|         R2APIDelegateGet |   111.1697 ns |  0.0877 ns |  0.0777 ns |   111.1654 ns |
|                DirectSet |     1.0402 ns |  0.0013 ns |  0.0011 ns |     1.0401 ns |
|                DirectGet |     0.0001 ns |  0.0002 ns |  0.0002 ns |     0.0000 ns |
|               DynamicSet |     6.3105 ns |  0.0086 ns |  0.0077 ns |     6.3063 ns |
|               DynamicGet |     6.7956 ns |  0.0099 ns |  0.0088 ns |     6.7940 ns |
|           R2APICachedGet |     1.4677 ns |  0.0053 ns |  0.0050 ns |     1.4648 ns |
|           R2APICachedSet |     2.0947 ns |  0.0007 ns |  0.0006 ns |     2.0946 ns |
|            ReflectionSet | 1,182.4428 ns |  2.1463 ns |  2.0076 ns | 1,182.2863 ns |
|            ReflectionGet | 1,151.3366 ns |  1.3423 ns |  1.1209 ns | 1,151.0935 ns |
|      CachedReflectionSet | 1,205.0010 ns | 23.1741 ns | 29.3079 ns | 1,206.7095 ns |
|      CachedReflectionGet | 1,167.1119 ns | 22.9962 ns | 50.4773 ns | 1,166.4891 ns |

oh, Benchmark.NET is nice, so I rewrote the test https://github.com/DemoJameson/R2API/blob/benchmark/R2API.BenchmarkDotNet/Program.cs
your cached refletion is slower than the reflection, that's weird

here are my result:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1620 (21H2)
Intel Core i7-4790 CPU 3.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4470.0), X64 RyuJIT
  Mono x64 : Mono 6.12.0 (Visual Studio), X64 

Job=Mono x64  Runtime=Mono x64  

|                                     Method |        Mean |      Error |     StdDev |  Gen 0 | Allocated |
|------------------------------------------- |------------:|-----------:|-----------:|-------:|----------:|
|                                  DirectSet |   0.5943 ns |  0.0281 ns |  0.0234 ns |      - |         - |
|                              ReflectionSet | 757.8906 ns | 15.0225 ns | 25.9130 ns | 0.0095 |         - |
|                        ReflectionCachedSet | 280.3415 ns |  5.6316 ns | 11.3761 ns |      - |         - |
|                         R2APISetFieldValue | 182.7628 ns |  2.4260 ns |  2.1506 ns | 0.0391 |         - |
| R2APISetFieldValueWithoutGetOrAddExtension | 146.5069 ns |  2.9395 ns |  6.3902 ns |      - |         - |
|                     R2APICachedSetDelegate |   2.5395 ns |  0.0594 ns |  0.0496 ns |      - |         - |
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1620 (21H2)
Intel Core i7-4790 CPU 3.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT


|                                     Method |       Mean |     Error |    StdDev |     Median |  Gen 0 | Allocated |
|------------------------------------------- |-----------:|----------:|----------:|-----------:|-------:|----------:|
|                                  DirectSet |   1.298 ns | 0.0450 ns | 0.0421 ns |   1.287 ns |      - |         - |
|                              ReflectionSet |  70.990 ns | 1.4369 ns | 2.3204 ns |  70.357 ns |      - |         - |
|                        ReflectionCachedSet |  50.997 ns | 1.1305 ns | 3.0174 ns |  49.936 ns |      - |         - |
|                         R2APISetFieldValue | 121.565 ns | 2.2159 ns | 4.5266 ns | 121.230 ns | 0.0229 |      96 B |
| R2APISetFieldValueWithoutGetOrAddExtension |  88.078 ns | 1.7678 ns | 2.2358 ns |  87.923 ns |      - |         - |
|                     R2APICachedSetDelegate |   2.761 ns | 0.0759 ns | 0.0673 ns |   2.773 ns |      - |         - |

I'll need to profile to figure out where our bottlenecks are and if it truly is just dictionary overhead, in which case it'd be recommended to just cache the reflection via the delegates like you're suggesting.

Also your mono test seems to suggest the R2API stuff is faster, no?

Also your mono test seems to suggest the R2API stuff is faster, no?

yeah it's 1.5x faster than cached reflection on mono, and can be improved with stop using GetOrAdd extension mehtod
and perhaps only run ThrowIfTCannotBeAssignedToField once when generating the set delegate

after add more benchmarks, I belive stop using GetOrAdd() extension method, and skip GetFieldCached() when getting cached delegate can increase speed by 1.0x to 0.6x

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1620 (21H2)
Intel Core i7-4790 CPU 3.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
  [Host]             : .NET Framework 4.8 (4.8.4470.0), X64 RyuJIT
  .NET 6.0           : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  .NET Core 3.1      : .NET Core 3.1.23 (CoreCLR 4.700.22.11601, CoreFX 4.700.22.12208), X64 RyuJIT
  .NET Framework 4.8 : .NET Framework 4.8 (4.8.4470.0), X64 RyuJIT
  Mono x64           : Mono 6.12.0 (Visual Studio), X64

|                                                Method |            Runtime |        Mean |      Error |     StdDev |      Median | Ratio | RatioSD |  Gen 0 | Allocated |
|------------------------------------------------------ |------------------- |------------:|-----------:|-----------:|------------:|------:|--------:|-------:|----------:|
|                                             DirectSet |           .NET 6.0 |   1.2450 ns |  0.0405 ns |  0.0359 ns |   1.2440 ns |  0.03 |    0.00 |      - |         - |
|                                         ReflectionSet |           .NET 6.0 |  67.0370 ns |  0.3557 ns |  0.3153 ns |  67.1203 ns |  1.42 |    0.04 |      - |         - |
|                                   ReflectionCachedSet |           .NET 6.0 |  46.9951 ns |  0.7648 ns |  1.0721 ns |  46.5217 ns |  1.00 |    0.00 |      - |         - |
|                                   R2API.SetFieldValue |           .NET 6.0 | 115.5609 ns |  1.1190 ns |  0.9344 ns | 115.3770 ns |  2.45 |    0.08 | 0.0229 |      96 B |
|                    R2API.SetFieldValueWithoutGetOrAdd |           .NET 6.0 |  85.2468 ns |  1.5939 ns |  1.7716 ns |  84.3717 ns |  1.81 |    0.05 |      - |         - |
|              R2API.SetFieldValueWithoutGetFieldCached |           .NET 6.0 |  63.9350 ns |  1.2532 ns |  1.1722 ns |  63.9267 ns |  1.36 |    0.04 |      - |         - |
| R2API.SetFieldValueWithoutGetFieldCachedButConcurrent |           .NET 6.0 |  63.7870 ns |  1.1529 ns |  0.9627 ns |  63.7227 ns |  1.35 |    0.05 |      - |         - |
|                               R2API.CachedSetDelegate |           .NET 6.0 |   2.6506 ns |  0.0523 ns |  0.0489 ns |   2.6475 ns |  0.06 |    0.00 |      - |         - |
|                                                       |                    |             |            |            |             |       |         |        |           |
|                                             DirectSet |      .NET Core 3.1 |   1.0286 ns |  0.0267 ns |  0.0223 ns |   1.0252 ns |  0.02 |    0.00 |      - |         - |
|                                         ReflectionSet |      .NET Core 3.1 |  89.4465 ns |  1.6952 ns |  3.0133 ns |  89.3504 ns |  1.78 |    0.10 |      - |         - |
|                                   ReflectionCachedSet |      .NET Core 3.1 |  50.4212 ns |  1.0243 ns |  1.3674 ns |  50.0660 ns |  1.00 |    0.00 |      - |         - |
|                                   R2API.SetFieldValue |      .NET Core 3.1 | 126.7100 ns |  2.3798 ns |  4.5277 ns | 125.7053 ns |  2.55 |    0.10 | 0.0229 |      96 B |
|                    R2API.SetFieldValueWithoutGetOrAdd |      .NET Core 3.1 |  93.3482 ns |  1.3521 ns |  1.1986 ns |  93.1215 ns |  1.84 |    0.05 |      - |         - |
|              R2API.SetFieldValueWithoutGetFieldCached |      .NET Core 3.1 |  84.1694 ns |  1.2950 ns |  1.1480 ns |  84.2868 ns |  1.66 |    0.04 |      - |         - |
| R2API.SetFieldValueWithoutGetFieldCachedButConcurrent |      .NET Core 3.1 |  86.8174 ns |  1.7241 ns |  2.5806 ns |  86.2713 ns |  1.73 |    0.07 |      - |         - |
|                               R2API.CachedSetDelegate |      .NET Core 3.1 |   2.9183 ns |  0.0617 ns |  0.0577 ns |   2.9227 ns |  0.06 |    0.00 |      - |         - |
|                                                       |                    |             |            |            |             |       |         |        |           |
|                                             DirectSet | .NET Framework 4.8 |   1.9718 ns |  0.0657 ns |  0.0583 ns |   1.9672 ns |  0.02 |    0.00 |      - |         - |
|                                         ReflectionSet | .NET Framework 4.8 | 149.3214 ns |  3.0096 ns |  7.6057 ns | 150.5088 ns |  1.59 |    0.11 |      - |         - |
|                                   ReflectionCachedSet | .NET Framework 4.8 |  93.9267 ns |  1.8452 ns |  4.5264 ns |  92.1088 ns |  1.00 |    0.00 |      - |         - |
|                                   R2API.SetFieldValue | .NET Framework 4.8 | 190.1930 ns |  3.8100 ns |  3.9126 ns | 190.3603 ns |  1.92 |    0.09 | 0.0229 |      96 B |
|                    R2API.SetFieldValueWithoutGetOrAdd | .NET Framework 4.8 | 129.7427 ns |  2.5831 ns |  2.4163 ns | 130.2974 ns |  1.30 |    0.06 |      - |         - |
|              R2API.SetFieldValueWithoutGetFieldCached | .NET Framework 4.8 | 113.2549 ns |  2.2371 ns |  2.5762 ns | 113.1042 ns |  1.15 |    0.05 |      - |         - |
| R2API.SetFieldValueWithoutGetFieldCachedButConcurrent | .NET Framework 4.8 | 120.0857 ns |  2.1488 ns |  2.0100 ns | 119.9020 ns |  1.21 |    0.05 |      - |         - |
|                               R2API.CachedSetDelegate | .NET Framework 4.8 |   3.7129 ns |  0.1038 ns |  0.1897 ns |   3.6402 ns |  0.04 |    0.00 |      - |         - |
|                                                       |                    |             |            |            |             |       |         |        |           |
|                                             DirectSet |           Mono x64 |   0.6534 ns |  0.0483 ns |  0.0807 ns |   0.6332 ns | 0.002 |    0.00 |      - |         - |
|                                         ReflectionSet |           Mono x64 | 781.4498 ns | 15.6646 ns | 27.8437 ns | 771.0183 ns | 2.808 |    0.09 | 0.0095 |         - |
|                                   ReflectionCachedSet |           Mono x64 | 280.1056 ns |  5.2672 ns |  5.1731 ns | 279.2684 ns | 1.000 |    0.00 |      - |         - |
|                                   R2API.SetFieldValue |           Mono x64 | 194.0034 ns |  3.7658 ns |  8.9498 ns | 190.0401 ns | 0.701 |    0.04 | 0.0391 |         - |
|                    R2API.SetFieldValueWithoutGetOrAdd |           Mono x64 | 138.3906 ns |  1.6438 ns |  1.2834 ns | 138.6294 ns | 0.494 |    0.01 |      - |         - |
|              R2API.SetFieldValueWithoutGetFieldCached |           Mono x64 |  95.7536 ns |  0.6677 ns |  0.5576 ns |  95.6929 ns | 0.341 |    0.01 |      - |         - |
| R2API.SetFieldValueWithoutGetFieldCachedButConcurrent |           Mono x64 |  97.0466 ns |  1.8803 ns |  4.2824 ns |  96.0399 ns | 0.352 |    0.01 |      - |         - |
|                               R2API.CachedSetDelegate |           Mono x64 |   2.8152 ns |  0.0636 ns |  0.0595 ns |   2.8153 ns | 0.010 |    0.00 |      - |         - |

Would you be interesting in submitting a PR for those changes? Thanks for investigating this, good stuff.

Of course providing the delegates directly will be optimal, but improving what we have currently could be a nice improvement for anyone using these utils already

sure, I will summit a PR

is .CastDelegate<GetDelegate<TReturn>>(); necessary? can we just direct cast

Yes we can get arid of that, forgot to mention I removed that in local testing.