graphql-python/graphene

n+1 queries in execution dataloader docs

nickvellios opened this issue · 6 comments

  • What is the current behavior?

https://github.com/graphql-python/graphene/blame/master/docs/execution/dataloader.rst#L57-L58

>>> users = {user.id: user for user in User.objects.filter(id__in=keys)}
>>> print(users)
{1: <User: Tom>, 2: <User: Frank>, ...}

Database is hit once for each user.


  • What is the expected behavior?
>>> user_ids = User.objects.filter(id__in=keys).values_list('id', flat=True)
>>> users = User.objects.in_bulk(user_ids)
>>> print(users)
{1: <User: Tom>, 2: <User: Frank>, ...}

A single query can be used to fetch all user IDs and users


  • What is the motivation / use case for changing the behavior?

Performance

@nickvellios I'm not sure what you mean. Running User.objects.filter(id__in=keys) should only execute 1 db query. Are you seeing something different?

When you take the part out that causes the n+1 queries, of course it goes away. But this dictionary comprehension is the problem, NOT the filter.

>>> users = {user.id: user for user in User.objects.filter(id__in=keys)}

@nickvellios sorry I still don't understand. What's the problem with the dictionary comprehension? In my local tests there is only 1 sql query made in that statement. Can you provide a complete example that can reproduce the issue you're seeing?

@jkimbo

>>> from django.db import connection, reset_queries
>>> from django.conf import settings
>>> settings.DEBUG = True
>>> 
>>> 
>>> def num_queries(reset=True):
...     print(len(connection.queries))
...     if reset:
...         reset_queries()
... 
>>> 
>>> keys = (1, 2, 3,)
>>> users = {user.id: user for user in User.objects.filter(id__in=keys)}
>>> num_queries()
7
>>> user_ids = User.objects.filter(id__in=keys).values_list('id', flat=True)
>>> users = User.objects.in_bulk(user_ids)
>>> num_queries()
2
>>> 

However, I'm now seeing that either Django or Postgres is caching, so running this a second time outputs 1 & 2 instead of 7 & 2. Reindexing and discarding query plans didn't seem to revert it and now I can't reproduce this easily.

What Django version are you on @nickvellios ? I can't reproduce what you're seeing locally. When running your code in my local project I only get 1 query for users = {user.id: user for user in User.objects.filter(id__in=keys)}.

Also 7 queries doesn't make sense since it's only selecting 3 ids in your example. Is your user model prefetching related objects? Can you print out the actual queries that are being made as well?

Closing out this issue. Using graphene 2.x and having inconsistent results. Will test further when we upgrade.