izimobil/django-rest-framework-datatables

Serverside sorting with ForeignKey produces duplicate

alfonsrv opened this issue · 4 comments

When sorting a foreign key-related model, it's currently not possible to only consider one / the latest element. A use-case here would be keeping historic data and only displaying the latest (and only relevant) element.

When sorting in such a scenario, all entries that have multiple instances of a related model will spawn duplicates with the previous' instance's value.


An example might make it clearer. Slightly modified models and serializer from the example app:

class Artist(models.Model):
    name = models.CharField('Name', max_length=80)

    def __str__(self):
        return self.name

    @property
    def latest_album(self) -> 'Album':  # added this
        return self.albums.latest()

class Album(models.Model):
    name = models.CharField('Name', max_length=80)
    artist = models.ForeignKey(
        Artist,
        models.CASCADE,
        verbose_name='Artist',
        related_name='albums'
    )

class ArtistSerializer(serializers.ModelSerializer):
    latest_album = serializers.CharField(read_only=True)  # added this

    class Meta:
        model = Artist
        fields = (
            'name', 'latest_album',
        )

Assume we only wanted to show the latest album of an Artist as some kind of release-table, we would define a datatable along those lines:

<!-- HTML boilerplate -->
<script>
    $(document).ready(function() {
          $('#artists').DataTable({
              'serverSide': true,
              'ajax': '/api/artists/?format=datatables',
              'columns': [
                  {'data': 'name'},
                  {'data': 'latest_album', 'name': 'albums'},
              ]
          });
      });
</script>

The issue here being, that we cannot define 'name': 'albums__latest/'/__last/[-1] or filter for only the latest instance in any kind. It might also be possible somebody wanted to do this conditionally; getting the latest object a certain condition is True for; but that might make things too complicated.

Assume Van Morrison had two albums, Astral Weeks and Days Like This. Rendering this out initially would result in:

Artist Latest Album
Van Morrison Astral Weeks
Michael Jackson Thriller

And when sorting on Latest Album it would result in a duplicate entry for Van Morrison:

Artist Latest Album
Van Morrison Astral Weeks
Van Morrison Days Like This
Michael Jackson Thriller

Any idea on how to fix this? Am I missing something here? Do I have to use some form of always_serialize here / cannot use a property?

You can get the latest album (by year) for each artist using a subquery (added in overridden get_queryset()):

class AlbumViewSet(viewsets.ModelViewSet):
    queryset = Album.objects.none()
    serializer_class = AlbumSerializer

    def get_queryset(self):
        latest_album_qry = (Album.objects.filter(artist=OuterRef("artist_id")).order_by("-year"))[:1]
        qs = Album.objects.all().order_by("pk")
        qs = qs.filter(id__in=latest_album_qry.values("pk"))
        return qs.prefetch_related("artist", "genres")

    def get_options(self):
        return get_album_options()

    class Meta:
        datatables_extra_json = ('get_options', )

This generates the following sql:

SELECT "albums_album"."id", "albums_album"."name", "albums_album"."rank", "albums_album"."year", "albums_album"."artist_id" 
FROM "albums_album" INNER JOIN "albums_artist" ON ("albums_album"."artist_id" = "albums_artist"."id") 
WHERE "albums_album"."id" IN (
   SELECT U0."id" FROM "albums_album" U0 WHERE U0."artist_id" = "albums_album"."artist_id" ORDER BY U0."year" DESC LIMIT 1
) 
ORDER BY "albums_artist"."name" ASC LIMIT 10;

Any way to get the Artist as the primary queryset aka use an ArtistViewSet? Say you'd want to display mainly the artist's information along with the latest Album; this works, but messed up the hierarchy.

Another option would obviously be to de-normalize the Album, saving it on the Artist directly whenever a newer album is added via a signal, but that is kind of less-than-ideal. To emulate that I added the property, as it allows to keep the Artist as the main model when querying and prevents logic duplication among multiple parts of the project / scattered querysets.

I'm not sure what you mean, but I would suggest you write the query you want as SQL SELECT first, then port that over to a Django query. You might want to return all Artists, and then outer join the latest album onto each (query on Artist table); or you might just want the latest Album with associated Artist information (query on Album table).

Ah thinking SQL has increasingly become an up-hill battle once I started to delve into server-specific SQL-functions. Pair that with the Django query-specifics and I'm in awe by all the clever ways people come up with querying for things. I seriously don't know how people are doing it.

My favorite being queries like Model.objects.filter(last_datetime__lte=Now() + timedelta(seconds=1) * F("interval"))


Anyways, I solved this issue building on your answer by expanding the viewset's queryset as follows:

class ArtistViewSet():
   ...

    def get_queryset(self):
        queryset = super().get_queryset()
        albums = Album.objects.filter(artist=OuterRef('pk')).order_by('-created')
        queryset = queryset.annotate(
            latest_album=Subquery(albums.values('name')[:1])
        )
        return queryset

Thanks.