googleapis/python-ndb

Cloud NDB: ndb global cache keys may conflict for certain ID combinations (ex. 5761297639538688 and 5704016969334784).

Niccari opened this issue · 3 comments

Environment details

  • Google App Engine Gen.2/Python Runtime
  • Python 3.9.5
  • Python library dendencies
    • fastapi: 0.68.1
    • google-cloud-ndb: 1.10.2
    • redis-namespace: 3.0.1.1
    • redis: 3.0.1
    • uvicorn: 0.15.0

Steps to reproduce

We have prepared the code for reproduction, so please use it for a follow-up test.

  1. install Datastore Emulator and related gcloud components
$ gcloud components install \
    beta cloud-datastore-emulator app-engine-python app-engine-python-extras
  1. Create python virtual environment (We have been used pipenv for testing)
$ pipenv install
$ pipenv shell

Pipfile

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[packages]
fastapi="*"
google-cloud-ndb = "*"
redis-namespace = "*"
redis = "*"
uvicorn = "*"

[requires]
python_version = "3.9"

[scripts]
start = "uvicorn application:app --host 0.0.0.0 --port 8000 --workers 2"
  1. Install redis-server and launch it

  2. Launch the datastore emulator (in another terminal)

$ gcloud beta emulators datastore start
  1. Run the test server
$ (gcloud beta emulators datastore env-init)
$ pipenv run start
  1. Access to http://localhost:8000/ (at least twice)

Code example

application.py

import os

from fastapi import FastAPI
from fastapi import Request
from google.cloud import ndb
from redis_namespace import StrictRedis

app = FastAPI()


class Sample(ndb.Model):
    pass


@app.middleware("http")
async def use_datastore_context(request: Request, call_next):
    datastore_client = \
        ndb.Client(project=os.getenv("DATASTORE_PROJECT_ID"))
    datastore_cache = ndb.RedisCache(StrictRedis(
        host="localhost", port="6379", namespace="ndb:"))
    with datastore_client.context(
            global_cache=datastore_cache,
            global_cache_timeout_policy=24 * 60 * 60):
        response = await call_next(request)
        return response


@app.get("/")
async def test():
    user_id1 = 5761297639538688
    user_id2 = 5704016969334784

    def txn1():
        entity = Sample(id=user_id1)
        entity.put()

    def txn2():
        entity = Sample(id=user_id2)
        entity.put()

    sample1 = Sample.get_by_id(user_id1)
    sample2 = Sample.get_by_id(user_id2)
    if sample1 is None:
        ndb.transaction(txn1)
        sample1 = Sample.get_by_id(user_id1)
    if sample2 is None:
        ndb.transaction(txn2)
        sample2 = Sample.get_by_id(user_id2)

    print(f"sample1: {sample1.key.id()}, sample2: {sample2.key.id()}")

    return {}

Execution logs

The below result will be given if the redis cache and the datastore are empty.

In the first result, the IDs of the two samples are different.

However, in the second and subsequent result, they have the same ID.

Server log
$ pipenv run start
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started parent process [34729]
INFO:     Started server process [34732]
INFO:     Started server process [34731]
INFO:     Waiting for application startup.
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Application startup complete.
sample1: 5761297639538688, sample2: 5704016969334784
INFO:     127.0.0.1:65521 - "GET / HTTP/1.1" 200 OK
sample1: 5761297639538688, sample2: 5761297639538688
INFO:     127.0.0.1:65521 - "GET / HTTP/1.1" 200 OK
sample1: 5761297639538688, sample2: 5761297639538688
INFO:     127.0.0.1:65522 - "GET / HTTP/1.1" 200 OK
Redis log

You should see the cache keys for two entities, but only one will be displayed.

127.0.0.1:6379> keys *
1) "ndb:NDB30\n\x0b\x12\tsample_project\x12\x11\n\x06Sample\x10\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\n"

Hi, I was able to reproduce the problem using your example. But then I changed the Redis instance to redis.StrictRedis instead of redis_namespace.StrictRedis. I'm not familiar with the redis_namespace package, but it would appear that it is the source of your troubles.

Thanks @chrisrossi! As you said, it looks like the problem is caused by redis_namespace. In the redis_namespace, bytes keys were decoded as UTF-8. This caused bytes that could not be converted to utf-8 to be replaced with "\xef\xbf\xbd", leading to duplicate keys.

uses redis.StrictRedis (without "ndb:" namespace)

127.0.0.1:6379> keys *
1) "NDB30\n\x0b\x12\tsample_project\x12\x11\n\x06Sample\x10\x80\x80\x80\x9a\xea\xfb\x9d\n"
2) "NDB30\n\x0b\x12\tsample_project\x12\x11\n\x06Sample\x10\x80\x80\x80\x8a\xdf\xf8\x90\n"

without utf-8 decoding with redis_namespace

127.0.0.1:6379> keys *
1) "ndb:b'NDB30\\n\\x0b\\x12\\tsample_project\\x12\\x11\\n\\x06Sample\\x10\\x80\\x80\\x80\\x8a\\xdf\\xf8\\x90\\n'"
2) "ndb:b'NDB30\\n\\x0b\\x12\\tsample_project\\x12\\x11\\n\\x06Sample\\x10\\x80\\x80\\x80\\x9a\\xea\\xfb\\x9d\\n'"

with utf-8 decoding (key.decode('utf-8', 'replace'))

127.0.0.1:6379> keys *
1) 'ndb:NDB30\n\x0b\x12\tsample_project\x12\x11\n\x06Sample\x10\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\n'

Sounds like you're sorted then. I'll go ahead and close!