googleapis/python-ndb

Cloud NDB - Get operation can blindly overwrite key lock in memcache leading to cache inconsistency

Closed this issue · 1 comments

cloud-ndb v1.8.0 with Memcache global cache

There is a sequence of steps that can lead to cache inconsistency which is caused by a read thread overwriting the lock placed in memcache by a write thread. The sequence of steps is:

  1. Reader gets from memcache and finds nothing
  2. Writer writes lock value
  3. Reader overwrites lock value blindly using memcache set
  4. Reader watches key
  5. Reader reads from db
  6. Writer updates db
  7. Writer fails to delete lock from db for whatever reason (connection reset most likely currently)
  8. Reader writes stale value using cas

This can be addressed for at least Memcache with the following changes:

  1. Add an add() method to GlobalCache
  2. Implement add() for MemcacheCache using client.add() and throw some appropriately typed exception if add fails
  3. Add _cache._GlobalCacheAddBatch which will call it
  4. Modify _cache.global_lock() to call global_add when read, and global_set when write
  5. Modify _datastore_api.lookup() adding a try/catch around the lines which do the lock/watch setting key_locked = True on exception

This addresses the issue of overwriting the write lock by using memcache add to prevent the overwriting, and subsequently not attempting to write the new value back to memcache

This seems to be porting an issue filed on Legacy NDB: GoogleCloudPlatform/datastore-ndb-python#84