[API Proposal]: new `TypeLocal` class to get/set arbitrary values based on a supplied Generic Type
jasonswearingen opened this issue · 17 comments
Background and motivation
You can store and retrieve Type specific values efficiently using static classes, such as:
public static class TypeStore<T>{
public static int Value;
}
thus, for a given type supplied to a Generic method you can set/get a custom value without resorting to dictionary lookups based on the Type.
This is excellent, but there is no way to do this on a per-instance basis. Doing so can greatly improve efficiency of code needing per-type values by removing a dictionary lookup each time a type is supplied.
This is a feature request for performance optimization reasons. A real-world example situation where this is helpful is in a game engine "Entity Component System" where the user (the game developer) arbitrarily wants to pick a component of type TComponent
, and the game engine needs to somehow look this up in a very efficient manner, due to frame time constraints plus that this kind of lookup may be done many times during a game frame.
API Proposal
/// <summary>
/// similar to ThreadLocal, but provides a value per type.
/// </summary>
/// <typeparam name="TValue"></typeparam>
public interface TypeLocal<TValue>
{
public TValue Get<TType>();
public bool TryGet<TType>(out TValue value);
public void Set<TType>(TValue value);
public void Remove<TType>();
public bool HasValue<TType>();
public IEnumerator<(Type type, TValue value)> GetEnumerator();
}
API Usage
Note this is an updated example because my original example was not clear:
public class MyLookupHelper //helper to get an array index
{
private int indexCounter;
private TypeLocal<int> typeLocal = new();
public int GetIndex<T>()
{
if (!typeLocal.TryGet<T>(out var index){
index = indexCounter++;
typeLocal.Set<T>(index);
}
return index;
}
}
Overall the general usage pattern would be very similar to a normal Dictionary<Type,Value>
Risks
???
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
I see the following risks:
- The API is not thread-safe. There is no
GetOrUpdate
,TryGet
etc. - Potential memory leak if
TType
is reference type or the type contains fields of reference type. The stored value of typeTType
will have the same lifetime astypeof(TValue)
.
DependentHandle or ConditionalWeakTable can be used to achieve the same effect without introducing a new API.
* The API is not thread-safe. There is no `GetOrUpdate`, `TryGet` etc.
I did mention TryGet
in my toy api proposal, but think if this addition is seriously considered, yes having thread safe options would be nice, but not required if it makes the implementation more difficult. (Okay to leave it up to the use to synchronize)
* Potential memory leak if `TType` is reference type or the type contains fields of reference type. The stored value of type `TType` will have the same lifetime as `typeof(TValue)`.
I don't understand why this would cause a memory leak. Or maybe you are calling out why the "API Proposal" did not include a .Dispose()
or .Clear()
function? I was trying to keep the example api as clear to the proposed idea as possible. I agree more helper functions/plumbing is needed for a real implementation
DependentHandle or ConditionalWeakTable can be used to achieve the same effect without introducing a new API.
Can you please elaborate? I don't see how using either DependentHandle
or ConditionalWeakTable
solves this problem.
I don't understand why this would cause a memory leak.
Because the stored value (of reference type) will live forever until Remove
is called.
Can you please elaborate?
Sure:
public static class TypeStore<T>
{
private static readonly ConditionalWeakTable<Type, object> _storage = new();
public bool TryGet<TValue>(out TValue result) => _storage.TryGetValue(typeof(TValue), out result);
public void Set<TValue>(TValue value) => _storage.AddOrUpdate(typeof(TValue), value);
}
Now the problem with a memory leak becomes more clear. According to ConditionalWeakTable docs, the table keeps weak reference to the key. When the key has no strong references, the value is reclaimable by GC. However, typeof(TValue) returns the same Type
instance for the same actual TValue
instantiation. Now let's take a look at Type docs:
- A
Type
object that represents a type is unique; that is, twoType
object references refer to the same object if and only if they represent the same type. - In multithreading scenarios, do not lock
Type
objects in order to synchronize access to static data.
It means that the object returned by typeof(TValue)
operator will live forever. As a result, the table will never drop that key unless Remove
is called explicitly.
Generally speaking, there is no lifetime concept applicable to constructed generic type. Probably, the best choice here is using Dictionary<Type, WeakReference>
instead of ConditionalWeakTable
. But you need to keep the strong reference somewhere else outside of the dictionary. Otherwise, if there are no strong reference, the stored valuable can be collected by GC.
Now let's return back to API proposal:
var obj = new object();
TypeLocal<string>.Set(obj);
What the lifetime of obj
do you expect? When obj
will be available for GC?
I think you are misunderstanding the intent behind this proposal.
The intent is to have an instance based storage of values per type, that does not require a dictionary lookup.
The example you provide using ConditionalWeakTable
, I assume is very similar to an example using a normal Dictionary
, which requires a key hash+lookup.
I am working under the assumption that the static based workflow (repeated below) does not require a dictionary lookup:
public static class TypeStore<T>{
public static int Value;
}
This is an excellent high-performance way of storing/retrieving data on a per-type basis, but it is only available as a static global. For high performance scenarios it would be very beneficial to have the same benefit on a per-instance basis.
What the lifetime of obj do you expect? When obj will be available for GC?
It would be used very similar to a dictionary, not a global reference. I think my explanation of the existing global static workflow is confusing your interpretation of my proposal. The usage would be the same as a Dictionary<TType,TItem>
So it would go out of scope and collected like normal objects.
To be clear, this proposal would require modifications to the dotnet runtime. The skills required to speculate the extent of the modifications are sadly beyond my reach.
Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.
Issue Details
Background and motivation
You can store and retrieve Type specific values efficiently using static classes, such as:
public static class TypeStore<T>{
public static int Value;
}
thus, for a given type supplied to a Generic method you can set/get a custom value without resorting to dictionary lookups based on the Type.
This is excellent, but there is no way to do this on a per-instance basis. Doing so can greatly improve efficiency of code needing per-type values by removing a dictionary lookup each time a type is supplied.
This is a feature request for performance optimization reasons. A real-world example situation where this is helpful is in a game engine "Entity Component System" where the user (the game developer) arbitrarily wants to pick a component of type TComponent
, and the game engine needs to somehow look this up in a very efficient manner, due to frame time constraints plus that this kind of lookup may be done many times during a game frame.
API Proposal
/// <summary>
/// similar to ThreadLocal, but provides a value per type.
/// </summary>
/// <typeparam name="TValue"></typeparam>
public interface TypeLocal<TValue>
{
public TValue Get<TType>();
public bool TryGet<TType>(out TValue value);
public void Set<TType>(TValue value);
public void Remove<TType>();
public bool HasValue<TType>();
public IEnumerator<(Type type, TValue value)> GetEnumerator();
}
API Usage
public class SomeClass
{
private int indexCounter;
public int GetTypeIndex<T>()
{
var typeLocal = new TypeLocal<int>();
if (!typeLocal.TryGet<T>(out var index){
index = indexCounter++;
typeLocal.Set<T>(index);
}
return index;
}
}
Overall the general usage pattern would be very similar to a normal Dictionary<Type,Value>
Risks
???
Author: | jasonswearingen |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
@jasonswearingen , thanks for clarification! If I understand correctly, TypeLocal
has no specific rules related to its lifetime. If so, I'm not clearly understand how your example should work.
public int GetTypeIndex<T>()
{
var typeLocal = new TypeLocal<int>();
if (!typeLocal.TryGet<T>(out var index){
index = indexCounter++;
typeLocal.Set<T>(index);
}
return index;
}
TypeLocal
is created on every call of GetTypeIndex
method. Therefore, TryGet<T>
will always return false because a newly created TypeLocal
is empty. If you want to keep the value between method calls for the same actual T
then what's the reason to instantiate TypeLocal
for each call?
you are right, I'm sorry I rushed through the example unthoughtfully. here is a better example. Please let me know if it needs further explanation.
public class MyLookupHelper //helper to get an array index
{
private int indexCounter;
private TypeLocal<int> typeLocal = new();
public int GetIndex<T>()
{
if (!typeLocal.TryGet<T>(out var index){
index = indexCounter++;
typeLocal.Set<T>(index);
}
return index;
}
}
PS: will update the original example with this too.
The intent is to have an instance based storage of values per type, that does not require a dictionary lookup.
Could you elaborate on what you mean here? Even with public static class TypeStore<T>{ public static int Value; }
there is potentially some lookup that occurs, its just abstracted away and managed by the runtime.
RyuJIT in particular will currently ensure that this is "specialized" for value type T
, but for reference type T
this will end up using System.__Canon
. It's also not always "free" since this involves running a static initializer among other things on first access
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACY8pJ0hgFQE8AHGAZQzQYAHg4A+AN40Gspo2asAlgDsMDAGrYANgFcYAbhoBfGjXpsGAYQbTqc+UxZMUDALIAKAJQy5dhw7MAJwe3HyCwiKqGOIAdFp6MF5G9gGywaG8AkKwIgBm2hDYMfE6+sm+aRlh2ZEQwABWMGAlCeUpaenkITURucwADHFtSR2d1Vl9olbDZaOVDKbUxkA=== shows an example where the disassembly of:
Console.WriteLine(TypeStore<int>.Value);
Console.WriteLine(TypeStore<float>.Value);
Console.WriteLine(TypeStore<object>.Value);
Console.WriteLine(TypeStore<string>.Value);
Console.WriteLine(TypeStore<C>.Value);
is:
L0000: sub rsp, 0x28
L0004: mov rcx, 0x7ff9ee6fcb90
L000e: xor edx, edx
L0010: call 0x00007ffa49cc8b60
L0015: mov ecx, [rax+8]
L0018: call System.Console.WriteLine(Int32)
L001d: mov rcx, 0x7ff9ee6fcb90
L0027: mov edx, 1
L002c: call 0x00007ffa49cc8b60
L0031: mov ecx, [rax+8]
L0034: call System.Console.WriteLine(Int32)
L0039: mov rcx, 0x7ff9ee6fcb90
L0043: mov edx, 2
L0048: call 0x00007ffa49cc8b60
L004d: mov ecx, [rax+8]
L0050: call System.Console.WriteLine(Int32)
L0055: mov rcx, 0x7ff9ee6fcb90
L005f: mov edx, 3
L0064: call 0x00007ffa49cc8b60
L0069: mov ecx, [rax+8]
L006c: call System.Console.WriteLine(Int32)
L0071: mov rcx, 0x7ff9ee6fcb90
L007b: mov edx, 4
L0080: call 0x00007ffa49cc8b60
L0085: mov ecx, [rax+8]
L0088: add rsp, 0x28
L008c: jmp System.Console.WriteLine(Int32)
I think the best explanation would be a benchmark comparison. Here is one showing the performance discrepancy:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19042.1237 (20H2/October2020Update)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100-rc.1.21458.32
[Host] : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT
DefaultJob : .NET 6.0.0 (6.0.21.45113), X64 RyuJIT
| Method | Mean | Error | StdDev |
|------------------- |------------:|----------:|----------:|
| GlobalLookupTest | 18.68 us | 0.226 us | 0.189 us |
| InstanceLookupTest | 1,271.44 us | 24.573 us | 26.293 us |
Here is the code associated with the benchmark, run via BenchmarkDotNet ***(Click to expand)***
using BenchmarkDotNet.Attributes;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace lowlevel_benchmark.Benchmarks;
public abstract class GlobalLookup
{
protected static int indexCounter;
public static int Get<T>()
{
return GlobalLookup<T>.Get();
}
}
public class GlobalLookup<T> : GlobalLookup
{
private static int index = Interlocked.Increment(ref indexCounter);
public static int Get()
{
return index;
}
}
public class InstanceLookup
{
private int indexCounter;
private Dictionary<Type, int> storage = new();
public int Get<T>()
{
if (!storage.TryGetValue(typeof(T), out var index))
{
index = indexCounter++;
storage.Add(typeof(T), index);
}
return index;
}
}
public class TypeBased_Lookups
{
private int LOOPCOUNT = 10000;
private InstanceLookup instanceLookup = new();
[Benchmark]
public int GlobalLookupTest()
{
var toReturn = 0;
for (var i = 0; i < LOOPCOUNT; i++)
{
//toReturn += GlobalLookup.Get<int>();
//toReturn += GlobalLookup.Get<float>();
//toReturn += GlobalLookup.Get<bool>();
//toReturn += GlobalLookup.Get<long>();
//toReturn += GlobalLookup.Get<byte>();
//toReturn += GlobalLookup.Get<short>();
//toReturn += GlobalLookup.Get<double>();
//toReturn += GlobalLookup.Get<string>();
//toReturn += GlobalLookup.Get<object>();
toReturn += GlobalLookup.Get<GlobalLookup<int>>();
toReturn += GlobalLookup.Get<GlobalLookup<string>>();
toReturn += GlobalLookup.Get<GlobalLookup<object>>();
toReturn += GlobalLookup.Get<GlobalLookup<TypeBased_Lookups>>();
toReturn += GlobalLookup.Get<InstanceLookup>();
toReturn += GlobalLookup.Get<TypeBased_Lookups>();
toReturn += GlobalLookup.Get<System.Threading.Thread>();
toReturn += GlobalLookup.Get<System.Collections.ArrayList>();
}
return toReturn;
}
[Benchmark]
public int InstanceLookupTest()
{
var toReturn = 0;
for (var i = 0; i < LOOPCOUNT; i++)
{
//toReturn += instanceLookup.Get<int>();
//toReturn += instanceLookup.Get<float>();
//toReturn += instanceLookup.Get<bool>();
//toReturn += instanceLookup.Get<long>();
//toReturn += instanceLookup.Get<byte>();
//toReturn += instanceLookup.Get<short>();
//toReturn += instanceLookup.Get<double>();
//toReturn += instanceLookup.Get<string>();
//toReturn += instanceLookup.Get<object>();
toReturn += instanceLookup.Get<GlobalLookup<int>>();
toReturn += instanceLookup.Get<GlobalLookup<string>>();
toReturn += instanceLookup.Get<GlobalLookup<object>>();
toReturn += instanceLookup.Get<GlobalLookup<TypeBased_Lookups>>();
toReturn += instanceLookup.Get<InstanceLookup>();
toReturn += instanceLookup.Get<TypeBased_Lookups>();
toReturn += instanceLookup.Get<System.Threading.Thread>();
toReturn += instanceLookup.Get<System.Collections.ArrayList>();
}
return toReturn;
}
}
They do the same thing, but one is a static, the other is an instance.
public int Get<T>()
{
if(!storage.TryGetValue(typeof(T), out var index))
{
index = indexCounter++;
storage.Add(typeof(T), index);
}
return index;
}
You can optimize this code. Currently, it requires two lookups: TryGetValue
and Add
. Try to replace it with a single call to CollectionsMarshal.GetValueRefOrAddDefault (available in .NET 6).
RyuJIT in particular will currently ensure that this is "specialized" for value type
T
, but for reference typeT
this will end up usingSystem.__Canon
. It's also not always "free" since this involves running a static initializer among other things on first access
I updated my benchmark (prior post) to only use reference types. (prior it was mostly the special value types). The results still hold.
You can optimize this code. Currently, it requires two lookups: TryGetValue and Add.
I believe that is true for only the first loop through the first benchmark run, all further runs will just do the single lookup.
But, in interest of trying, I did what you suggest:
***(Click to expand)***
public class InstanceLookup
{
private int indexCounter;
private Dictionary<Type, int> storage = new();
public int Get<T>()
{
ref var toReturn = ref System.Runtime.InteropServices.CollectionsMarshal.GetValueRefOrAddDefault(storage, typeof(T), out var exists);
if (!exists)
{
toReturn = indexCounter++;
}
return toReturn;
//if (!storage.TryGetValue(typeof(T), out var index))
//{
// index = indexCounter++;
// storage.Add(typeof(T), index);
//}
//return index;
}
}
and get slightly better results:
| Method | Mean | Error | StdDev |
|------------------- |------------:|----------:|----------:|
| GlobalLookupTest | 18.64 us | 0.132 us | 0.110 us |
| InstanceLookupTest | 1,021.10 us | 20.000 us | 28.684 us |
@jasonswearingen , I found the solution for you. As far as I understand, you trying to emulate the following data type:
public struct TypeLocal<TValue>
{
private TValue valueForType1;
private TValue valueForType2;
private TValue valueForType3;
}
But the number of fields is not known because actual T
can be represented by any type. Fortunately, we have a perfect data structure to emulate such behavior - arrays. All we need is to associate particular array index with actual generic instantiation.
Here is the solution and the benchmark:
[SimpleJob(runStrategy: RunStrategy.Throughput, launchCount: 1)]
[Orderer(SummaryOrderPolicy.FastestToSlowest)]
public class LookupBenchmark
{
public static class StaticTypeLocal<TValue>
{
private static class TypeStore<T>
{
internal static TValue Value;
}
public static TValue Get<T>() => TypeStore<T>.Value;
public static void Set<T>(TValue value) => TypeStore<T>.Value = value;
}
public readonly struct InstanceTypeLocalDictionary<TValue>
{
private readonly Dictionary<Type, TValue> _storage = new(10);
public TValue Get<T>() => _storage[typeof(T)];
public void Set<T>(TValue value) => _storage[typeof(T)] = value;
}
public struct InstanceTypeLocalArray<TValue>
{
private static volatile int TypeIndex = -1;
private static class TypeSlot<T>
{
internal static readonly int Index = Interlocked.Increment(ref TypeIndex);
}
private TValue[] storage;
public InstanceTypeLocalArray()
{
storage = new TValue[Math.Max(1, TypeIndex + 1)];
}
private TValue[] EnsureStorageCapacity<T>()
{
if (TypeSlot<T>.Index >= storage.Length)
Array.Resize(ref storage, TypeSlot<T>.Index + 1);
return storage;
}
public void Set<T>(TValue value)
{
Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(EnsureStorageCapacity<T>()), TypeSlot<T>.Index) = value;
}
public TValue Get<T>()
{
return Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(EnsureStorageCapacity<T>()), TypeSlot<T>.Index);
}
}
private InstanceTypeLocalDictionary<int> typeLocalDictionary = new();
private InstanceTypeLocalArray<int> typeLocalArray = new();
[Benchmark]
public int UsingStaticClass()
{
StaticTypeLocal<int>.Set<string>(10);
return StaticTypeLocal<int>.Get<string>();
}
[Benchmark]
public int UsingDictionary()
{
typeLocalDictionary.Set<string>(10);
return typeLocalDictionary.Get<string>();
}
[Benchmark]
public int UsingArray()
{
typeLocalArray.Set<string>(10);
return typeLocalArray.Get<string>();
}
}
My benchmark results:
Method | Mean | Error | StdDev |
---|---|---|---|
UsingStaticClass | 0.0426 ns | 0.0166 ns | 0.0155 ns |
UsingArray | 1.5295 ns | 0.0081 ns | 0.0075 ns |
UsingDictionary | 34.4530 ns | 0.5816 ns | 0.5440 ns |
Additionally, the solution can be optimized for memory consumption. Actually, we can rent the array from the pool, implement IDisposable
for our struct and release the rented array in Dispose
method.
There is one drawback of the solution: if you have a large number of T
generic instantiations then TypeLocal
must maintain the large array. However, I'm expecting that the actual number of T
variations is of tens.
Also, I'm not pretty sure that that guys from .NET Team will approve that API because its use case is very specific. If this will not happen, I can suggest to move the implementation of this data structure to .NEXT library which is a part of the .NET Foundation.
P.S.: struct in my example is just a way to remove the indirection. Probably, the best choice for TypeLocal
is class.
Thanks for your help @sakno , I implemented a similar workaround, but yours is better, in that you were able to figure out how to encapsulate it into a TypeLocal
class. So I will use your solution.
I do think that a runtime native solution would be greatly beneficial: as, besides the space inefficiencies you mentioned, your benchmark shows that in some circumstances it's still a 35x performance increase using the Static vs the Array solution.
I do think that a runtime native solution would be greatly beneficial
There is no magic and CLR/C++ level. The solution still requires array-like data structure, no matter in which language it is implemented. IMO, ~ 1.6 ns is a very good result. Get<T>
will be translated by JIT to a few assembly instructions. In other words, its overhead is equal to the array access without bounds check. I'm not sure that there is a room for further improvements.
Here is another way. :)
public static class StaticGenericDictionary
{
private static class TypeStore<TTypeKey, TTypeValue> where TTypeValue : unmanaged // prevents reference type using this, also checks the fields. reduces usages to specialized structs and integrals.
{
internal static TTypeValue Value = default!;
}
public static TTypeValue Get<TTypeKey, TTypeValue>() => TypeStore<TTypeKey, TTypeValue>.Value;
public static void Set<TTypeKey, TTypeValue>(TTypeValue value) => TypeStore<TTypeKey, TTypeValue>.Value = value;
}
@jasonswearingen I know this post is kind of old, but DotNext provides a couple of different type map implementations and their documentation links here. Might be worth you giving it a look.