neon-sunset/fast-cache

[NonBlocking] NullReferenceException

troepolik opened this issue · 4 comments

Good day.
We faced one error in prod. It arises only once or twice per day, so it's not big problem for us, but still...

System.NullReferenceException: Object reference not set to an instance of an object.
   at NonBlocking.DictionaryImplBoxed`2.TryClaimSlotForPut(Boxed`1& entryKey, TKey key)
   at NonBlocking.DictionaryImpl`3.PutIfMatch(TKey key, TValue newVal, TValue& oldVal, ValueMatch match)
   at FastCache.Cached`2.Save(V value, TimeSpan expiration)

TKey in this case was ValueTuple like (long configId, int langId, etc...), but this error arises in different places, so I guess problem is not related what we save to cache.
I tried to figured out some problems in code but didn't find anything suspicious

protected override bool TryClaimSlotForPut(ref Boxed<TKey> entryKey, TKey key)
        {
            var entryKeyValue = entryKey;
            if (entryKeyValue == null)
            {
                entryKeyValue = Interlocked.CompareExchange(ref entryKey, new Boxed<TKey>(key), null);
                if (entryKeyValue == null)
                {
                    // claimed a new slot
                    this.allocatedSlotCount.Increment();
                    return true;
                }
            }

            return _keyComparer.Equals(key, entryKey.Value);
        }

The only thought I have is some code set entryKey to null in parallel thread between
entryKeyValue = Interlocked.CompareExchange(ref entryKey, new Boxed<TKey>(key), null);
and _keyComparer.Equals(key, entryKey.Value);
so may be entryKey.Value throws null reference exception.
But the error is so rare in production that have 6k rps so I can not reproduce it.
But mb you have any ideas about it? May it be background eviction?

Interesting, this definitely needs to be looked into. At the first glance, I don't see any other place it could throw at either, besides entryKey.Value which dereferences the ref and then accesses the field.

I'll try a few stress scenarios to see if this can be reproduced, since the issue most likely will need to be fixed in NonBlocking. Eviction should not be able to cause this (most of its code is simple enough).

On a second look, it's possible return _keyComparer.Equals(key, entryKey.Value); should have been return _keyComparer.Equals(key, entryKeyValue); (or entryKey?.Value) instead. I'll audit the source and see if that's the case.

Thank you very match for the quick solution!