django-json-api/django-rest-framework-json-api

N+1 queries with ResourceRelatedFields not included

andrewbird2 opened this issue · 2 comments

Firstly, thanks for a great package everyone! Very excited to move across from vanilla DRF. Putting together a POC to sell this to my team, and came across an issue which i am stumped by. Consider a simple database of Document and Page objects. We are serializing a document object, with pages as a related field. The serializers might look like this:

class PageSerializer(serializers.ModelSerializer):
    class Meta:
        model = Page
        fields = ("id", "page_index", "height")

class DocumentSerializer(serializers.ModelSerializer):
    pages = ResourceRelatedField(queryset=Page.objects, many=True)
    collection = ResourceRelatedField(queryset=Collection.objects,
    )

    class Meta:
        model = Document
        fields = ('id', "identifier", "collection", "created_dt", "ready_dt", "pages")

When we hit /documents?include=pages, the AutoPrefetchMixin works its charms and correctly adds a pages prefetch to the queryset. However, when simply hit /documents, the queryset is unaltered, but we end up with an N+1 query in populating the page relationships for each document.

Of course, one could simply add "pages" a prefetch inside get_queryset. However, this prefetch is then applied on every request, even those that might simply be accessing a subset of document fields. e.g. /documents?fields[document]=identifier. This request shouldn't require any prefetch.

It appears that AutoPrefetchMixin should be updated to be cognizant of any many=True relationships that are going to need to be populated, regardless of whether they have been included. If this sounds right, I'm happy to put together a pull request. If it sounds like I'm barking mad or have overlooked something obvious, I'm eager to be told!

Thanks for bringing this up again. The issue is that AutoPrefetchMixin is very limited, but it is also very tricky to get it right.

Because the use case you have run into only happens when you use ModelSerializer which needs the id to render the relationships. However, if a HyperlinkedModelSerializer is used there will only be a link and no prefetch should be done. Also, as you noted, the AutoPrefetchMixin should also take into consideration sparse fields.

If you are willing to work on this, I am very happy to assist and look at PRs. Before diving into implementing it, I would recommend to look at discussion #921 where we have talked about these issues. Also, some work started on this but never got finished. Not sure the approach in that discussion is the right one, but when trying to make auto prefetching better, it needs to be considered that the code might get quickly very complex. It is important to keep it maintainable, and maybe restructuring is necessary.

Great, thanks for the response! If we end up going ahead with this package ill work on a PR