Serverside sorting with ForeignKey produces duplicate
alfonsrv opened this issue · 4 comments
When sorting a foreign key-related model, it's currently not possible to only consider one / the latest element. A use-case here would be keeping historic data and only displaying the latest (and only relevant) element.
When sorting in such a scenario, all entries that have multiple instances of a related model will spawn duplicates with the previous' instance's value.
An example might make it clearer. Slightly modified models and serializer from the example app:
class Artist(models.Model):
name = models.CharField('Name', max_length=80)
def __str__(self):
return self.name
@property
def latest_album(self) -> 'Album': # added this
return self.albums.latest()
class Album(models.Model):
name = models.CharField('Name', max_length=80)
artist = models.ForeignKey(
Artist,
models.CASCADE,
verbose_name='Artist',
related_name='albums'
)
class ArtistSerializer(serializers.ModelSerializer):
latest_album = serializers.CharField(read_only=True) # added this
class Meta:
model = Artist
fields = (
'name', 'latest_album',
)
Assume we only wanted to show the latest album of an Artist as some kind of release-table, we would define a datatable along those lines:
<!-- HTML boilerplate -->
<script>
$(document).ready(function() {
$('#artists').DataTable({
'serverSide': true,
'ajax': '/api/artists/?format=datatables',
'columns': [
{'data': 'name'},
{'data': 'latest_album', 'name': 'albums'},
]
});
});
</script>
The issue here being, that we cannot define 'name': 'albums__latest/'
/__last
/[-1]
or filter for only the latest instance in any kind. It might also be possible somebody wanted to do this conditionally; getting the latest object a certain condition is True
for; but that might make things too complicated.
Assume Van Morrison
had two albums, Astral Weeks
and Days Like This
. Rendering this out initially would result in:
Artist | Latest Album |
---|---|
Van Morrison | Astral Weeks |
Michael Jackson | Thriller |
And when sorting on Latest Album
it would result in a duplicate entry for Van Morrison
:
Artist | Latest Album |
---|---|
Van Morrison | Astral Weeks |
Van Morrison | Days Like This |
Michael Jackson | Thriller |
Any idea on how to fix this? Am I missing something here? Do I have to use some form of always_serialize
here / cannot use a property
?
You can get the latest album (by year) for each artist using a subquery (added in overridden get_queryset()
):
class AlbumViewSet(viewsets.ModelViewSet):
queryset = Album.objects.none()
serializer_class = AlbumSerializer
def get_queryset(self):
latest_album_qry = (Album.objects.filter(artist=OuterRef("artist_id")).order_by("-year"))[:1]
qs = Album.objects.all().order_by("pk")
qs = qs.filter(id__in=latest_album_qry.values("pk"))
return qs.prefetch_related("artist", "genres")
def get_options(self):
return get_album_options()
class Meta:
datatables_extra_json = ('get_options', )
This generates the following sql:
SELECT "albums_album"."id", "albums_album"."name", "albums_album"."rank", "albums_album"."year", "albums_album"."artist_id"
FROM "albums_album" INNER JOIN "albums_artist" ON ("albums_album"."artist_id" = "albums_artist"."id")
WHERE "albums_album"."id" IN (
SELECT U0."id" FROM "albums_album" U0 WHERE U0."artist_id" = "albums_album"."artist_id" ORDER BY U0."year" DESC LIMIT 1
)
ORDER BY "albums_artist"."name" ASC LIMIT 10;
Any way to get the Artist
as the primary queryset aka use an ArtistViewSet
? Say you'd want to display mainly the artist's information along with the latest Album
; this works, but messed up the hierarchy.
Another option would obviously be to de-normalize the Album
, saving it on the Artist
directly whenever a newer album is added via a signal
, but that is kind of less-than-ideal. To emulate that I added the property
, as it allows to keep the Artist
as the main model when querying and prevents logic duplication among multiple parts of the project / scattered querysets.
I'm not sure what you mean, but I would suggest you write the query you want as SQL SELECT
first, then port that over to a Django query. You might want to return all Artists, and then outer join the latest album onto each (query on Artist table); or you might just want the latest Album with associated Artist information (query on Album table).
Ah thinking SQL has increasingly become an up-hill battle once I started to delve into server-specific SQL-functions. Pair that with the Django query-specifics and I'm in awe by all the clever ways people come up with querying for things. I seriously don't know how people are doing it.
My favorite being queries like Model.objects.filter(last_datetime__lte=Now() + timedelta(seconds=1) * F("interval"))
Anyways, I solved this issue building on your answer by expanding the viewset's queryset as follows:
class ArtistViewSet():
...
def get_queryset(self):
queryset = super().get_queryset()
albums = Album.objects.filter(artist=OuterRef('pk')).order_by('-created')
queryset = queryset.annotate(
latest_album=Subquery(albums.values('name')[:1])
)
return queryset
Thanks.