Threading

Few implementations for pool of locks.

SemaphoreSlimPool

MemoryCache has an issue with reentrancy (see this). To tackle this for async, coupled with SemaphoreSlim being IDisposable, a lock pool of SemaphoreSlims can be used.

# key-specific lock with async
memoryCache = new
lockPool = new semaphoreSlimPool(1, 1)
task<T> getAsync(key, valueFactoryAsync):
    if memoryCache.tryGetValue(key, out value):
        return value
    lockForKey = lockPool.get(key)
    await lockForKey.waitAsync()
    try:
        if memoryCache.tryGetValue(key, out value):
            return value
        value = await valueFactoryAsync()
        memoryCache.set(key, value, ttl)
        return value
    finally:
        lockForKey.release()

LockPool

A lock pool of Objects for sync.

# key-specific lock with sync (instead, can also use a concurrent map)
memoryCache = new
lockPool = new lockPool()
T get(key, valueFactory):
    if memoryCache.tryGetValue(key, out value):
        return value
    lockForKey = lockPool.get(key)
    lock lockForKey:
        if memoryCache.tryGetValue(key, out value):
            return value
        value = valueFactory()
        memoryCache.set(key, value, ttl)
        return value

Cache Impls

Below are different cache impls to compare against.

CacheWithBlockingThread, CacheWithLock (uber lock), and CacheWithNoLock are not really good impls. CacheWithBlockingThread uses the sync-over-async anti-pattern, CacheWithLock uses an uber lock thereby blocking factory operations of other keys while performing one on a key, CacheWithNoLock has no critical section and blasts the downstream dep'y with multiple calls potentially causing cache stampede.
CacheWithTextbookPattern is how the guidance is currently, but it's reentrant.
CacheWithLazyCache uses LazyCache which solves the rentrancy issue using Lazy<T>, AsyncLazy<T>, and lock pool.
CacheWithFusionCache uses FusionCache. It has many different concurrency implementations.
CacheWithTask uses a technique to cache the task itself instead of the value, and the task is then awaited to get the value. The technique is quite clever, but tricky to follow.
CacheWithLockPool uses a lock pool where a key maps to the same lock using hashing, and multiple keys also map to a lock. It loses some parallelism when the valueFactory[Async] calls for a key error out as they get synchronized. poolSize is 1024 (processorCount of 16 * factor of 64), which is bit high to not cause a bottleneck.
CacheWithTaskAndLockPool reaps the best of both impls; cache the task itself instead of the value, and also use a lock pool to create the task in a critical section. The task is awaited outside of the lock (monitor) to get the value.

Perf

Verify_Perf for 200 batches where each batch consists of Environment.ProcessorCount tasks (total 16,000 tasks), 1,000 keys (cache items), 25ms cache ttl, 10ms of valueFactory delay. The test result shows that when the cache is write-heavy, CacheWithLockPool is slower than CacheWithLazyCache, CacheWithTask, and CacheWithTaskAndLockPool.

Impl (net6.0)	Time (secs)	Fault %	Tx/s	Issues
CacheWithBlockingThread	50.4	0	317	rentrant, slow, sync-over-async
CacheWithLock	50.4	0	317	slow
CacheWithNoLock	3.3	0	4848	highly rentrant
CacheWithTextbookPattern	3.4	0	4705	highly rentrant

CacheWithLazyCache	3.4	0	4705
CacheWithFusionCache	3.4	0	4705
CacheWithTask	3.3	0	4848	slightly rentrant
CacheWithLockPool	3.7	0	4324	slightly slow
CacheWithTaskAndLockPool	3.3	0	4848

Verify_Perf for 1,000 batches where each batch consists of Environment.ProcessorCount tasks (total 16,000 tasks), 1,000 keys (cache items), high cache ttl, 10ms of valueFactory delay. In contrast to above test, this test result showcases that all impls perf the same when the cache is read-heavy which is the typical case.

Impl (net6.0)	Time (secs)	Tx/s	Issues
CacheWithLazyCache	3.5	4571
CacheWithFusionCache	3.5	4571
CacheWithTask	3.5	4571	slightly rentrant
CacheWithLockPool	3.5	4571	slightly slow
CacheWithTaskAndLockPool	3.5	4571

Verify_Perf for 1,000 batches where each batch consists of Environment.ProcessorCount tasks (total 16,000 tasks), 1,000 keys (cache items), high cache ttl, 10ms of valueFactory delay, and faults. With faults, CacheWithLockPool is slower than others, CacheWithLazyCache, CacheWithTask, CacheWithTaskAndLockPool. This is because others will hand out the same task for the concurrent calls on a key, whereas CacheWithLockPool will synchronize them taking more time due to backpressure. CacheWithTask, CacheWithTaskAndLockPool do evict the task if it eventually faults/cancels using an IChangeToken so as to not cache errors, same with CacheWithLazyCache. CacheWithLazyCache, CacheWithTaskAndLockPool perf the same as CacheWithTask, and don't have CacheWithTask's reentrancy, and CacheWithLockPool's backpressure issues. CacheWithFusionCache is negligibly slower as the faults increase, but perfs similarly.

Impl (net6.0)	Time (secs)	Fault %	Tx/s	Issues
CacheWithLazyCache	3.5	0	4571
CacheWithFusionCache	3.6	0	4444
CacheWithTask	3.5	0	4571	slightly rentrant
CacheWithLockPool	3.5	0	4571	slightly slow
CacheWithTaskAndLockPool	3.5	0	4571

CacheWithLazyCache	3.9	10	4102
CacheWithFusionCache	4.0	10	4000
CacheWithTask	3.9	10	4102	slightly rentrant
CacheWithLockPool	4.0	10	4000	slightly slow
CacheWithTaskAndLockPool	3.9	10	4102

CacheWithLazyCache	7.0	50	2285
CacheWithFusionCache	7.1	50	2254
CacheWithTask	7.0	50	2285	slightly rentrant
CacheWithLockPool	7.2	50	2222	slightly slow
CacheWithTaskAndLockPool	7.0	50	2285

CacheWithLazyCache	16.5	100	969
CacheWithFusionCache	17.1	100	936
CacheWithTask	16.5	100	969	slightly rentrant
CacheWithLockPool	19.9	100	804	slightly slow
CacheWithTaskAndLockPool	16.5	100	969

Verdict

Use CacheWithLazyCache (uses LazyCache), CacheWithFusionCache (uses FusionCache), or CacheWithTaskAndLockPool.

`CacheWithTaskAndLockPool` impl

# key-specific lock with async
lockForKeyPool = new lockPool()
task<T> getAsync(key, valueFactoryAsync):
    if !cache.tryGetValue(key, out valueTask):
        lockForKey = lockForKeyPool.get(key)
        lock lockForKey:
            if !cache.tryGetValue(key, out valueTask):
                valueTask = cache.getOrAdd(key, (cacheEntry) =>
                {
                    cacheEntry.ttl = ttl
                    var task = valueFactoryAsync<T>(...)
                    cacheEntry.addExpirationToken(new evictErroredCachedTaskChangeToken(task))
                })
    return await valueTask

# evict task that errors out
class evictErroredCachedTaskChangeToken : iChangeToken
    changed = false
    ctor(task):
        changed = isEvictable(task)
        task.continueWith(t =>
            changed = isEvictable(t))
    bool isEvictable(task) => task.canceled || task.faulted
    hasChange => changed
    # ...

rmandvikar/threading