Optimize pagination of relay connection fields
Closed this issue · 9 comments
Copied from #1 (comment).
@tfoxy Do you use optimizers with any pagination? We're also using Apollo as the client and we want to provide cursor-based pagination. But we also have queries with nested connections e.g.:
{
products(first: 10) {
edges {
node {
name
variants(first: 5) {
edges {
node {
name
}
}
}
}
}
}
}
Using just the ORM I'm able to fetch the data only with two DB queries:
Product.objects.prefetch_related('variants')
I'm trying to achieve the same in the API but unfortunately any pagination seems to break the prefetches as they do slicing on querysets internally which results in refetching objects that were already cached by prefetch_related
.
Copied from #1 (comment).
When there is pagination, the default implementation of prefetch_related
doesn't work because it only prefetches the queryset.all()
query. This library handles different kind of querysets for prefetch_related
, so maybe it can optimize paginations. But right now it doesn't support it.
Copied from #1 (comment).
@maarcingebala, are you sure Product.objects.prefetch_related('variants')
works? Doesn't it fetch all the variants instead of just the first 5 from each product?
@tfoxy Yes it does, but I still can benefit from prefetching data in one query instead of doing duplicated queries for each row of data.
Anyway, I managed to use prefetching by using optimization hints and the slightly modified version of DjangoConnectionField
, as suggested in this comment. I'll keep experimenting with it. For now, I'd consider my question resolved, so you can close the issue.
I can understand the benefit of using prefetch_related
. It's something that I also do sometimes even if I slice the collection later.
I think it's better to leave it as a manual optimization by using the optimization hints. Glad you managed to work it out.
@maarcingebala How specifically are you using the OptimizationHints (instead of just the query) here?
@nwaxiomatic Here is an example usage of optimization hints in my project. There is a Product
model and each product can have multiple images (1-N relationship). From these images, we choose one as a product thumbnail to display in the UI.
Now, if we had a query that fetches first 5 products and a thumbnail for each of them:
{
products(first: 5) {
edges {
node {
name
thumbnailUrl
}
}
}
}
-
Without optimization hints
For each product, Django would do a query to fetch all its images and return one as a thumbnail. -
With optimization hints
@gql_optimizer.resolver_hints(prefetch_related='images')
tells Graphene to prefetchimages
relation when theresolve_thumbnail_url
resolver is used. As a result, each time the thumbnail field is used, I'm able to take advantage of prefetched images and avoid duplicated database queries.
@maarcingebala I see, so you aren't deconstructing this package down to the OptimizationHints object level. I've been following your issues here and in the graphene library trying to figure out if n+1 query issues are a problem with graphene still before switching my project over. (see graphql-python/graphene-django#429) So with that PrefetchConnection override you have there and the optimization hints above you've been able to weed out most n+1 issues?
In a nutshell, I was able to optimize all queries that our two client single-page apps are using. But it's still possible to perform a deeply nested query in our API which eventually would result in tens of duplicated database hits. Since we're going to make our API public, we will have to deal with this issue as well, but there are various techniques to handle that like query bounding, pagination limits etc.
So far I'm happy with this library, although it lacks documentation and to understand it you have to dig through the source code sometimes.
Hi! Sorry for the long time. I updated the docs. It's not perfect, but at least it mentions the resolver_hints
function.