Cloud NDB - Get operation can blindly overwrite key lock in memcache leading to cache inconsistency
Closed this issue · 1 comments
justinkwaugh commented
cloud-ndb v1.8.0 with Memcache global cache
There is a sequence of steps that can lead to cache inconsistency which is caused by a read thread overwriting the lock placed in memcache by a write thread. The sequence of steps is:
- Reader gets from memcache and finds nothing
- Writer writes lock value
- Reader overwrites lock value blindly using memcache set
- Reader watches key
- Reader reads from db
- Writer updates db
- Writer fails to delete lock from db for whatever reason (connection reset most likely currently)
- Reader writes stale value using cas
This can be addressed for at least Memcache with the following changes:
- Add an
add()
method toGlobalCache
- Implement
add()
forMemcacheCache
usingclient.add()
and throw some appropriately typed exception if add fails - Add
_cache._GlobalCacheAddBatch
which will call it - Modify
_cache.global_lock()
to callglobal_add
when read, andglobal_set
when write - Modify
_datastore_api.lookup()
adding a try/catch around the lines which do the lock/watch settingkey_locked = True
on exception
This addresses the issue of overwriting the write lock by using memcache add to prevent the overwriting, and subsequently not attempting to write the new value back to memcache
crwilcox commented
This seems to be porting an issue filed on Legacy NDB: GoogleCloudPlatform/datastore-ndb-python#84