django-unjoinify A library for efficiently retrieving deeply-nested data sets (for people who aren't afraid of a bit of SQL) The problem ----------- Suppose you have a website about film festivals. Each festival has many awards, and each award has many nominations. A nomination belongs to a movie, and a movie can have several directors (in a many-to-many relation). You want to have a page listing the full roster of nominated movies at a festival, including their directors. If you do this the naive way (looping over awards, then nominations / movies, then directors), you'll end up making lots and lots of little queries. This is bad. select_related won't help you here - it can't follow one-to-many and many-to-many relations. (See Django tickets #2238 and #6432.) The traditional answer is to bite the bullet and drop down to raw SQL at this point: SELECT tinseltown_award.id, tinseltown_award.name, tinseltown_nomination.id AS nomination__id, tinseltown_nomination.ranking AS nomination__ranking, tinseltown_movie.id AS nomination__movie__id, tinseltown_movie.title AS nomination__movie__title, tinseltown_person.id AS nomination__movie__directors__id, tinseltown_person.first_name AS nomination__movie__directors__first_name tinseltown_person.surname AS nomination__movie__directors__surname FROM tinseltown_award LEFT JOIN tinseltown_nomination ON (tinseltown_award.id = tinseltown_nomination.award_id) LEFT JOIN tinseltown_movie ON (tinseltown_nomination.movie_id = tinseltown_movie.id) LEFT JOIN tinseltown_movie_directors ON (tinseltown_movie.id = tinseltown_movie_directors.movie_id) LEFT JOIN tinseltown_person ON (tinseltown_movie_directors.person_id = tinseltown_person.id) WHERE tinseltown_award.festival_id = ? ORDER BY tinseltown_award.name, tinseltown_nomination.ranking This captures all the data you need for the page in a single query, but there's a downside: all you get back is a plain SQL result table, with no access to your model objects and their lovingly-crafted methods (think get_absolute_url). unjoinify to the rescue ----------------------- Provided you've used the double-underscore notation above for your column names, and made them match up with your relation names, unjoinify will take your query, and through some cunning ORM introspection, reconstruct an object tree: awards = unjoinify(Award, "SELECT tinseltown_award.id...", (festival_id,)) (Here Award is the base class that all the joins are hanging off, and festival_id is a parameter to the query.) Well... almost. Due to limitations of Django's ORM, we can't return a proper object tree with the ability to refer to award.nominations and so on. Instead, what you get back is an array of (award, nominations) tuples, where nominations is itself an array of (nomination, movie, directors) tuples, and directors is an array of person objects. This is good enough for iterating through in a template, though: {% for award, nominations in awards %} <h1>{{ award.name }}</h1> <ul> {% for nomination, movie, directors in nominations %} <li> {{ nomination.ranking }}: <a href="{{ movie.get_absolute_url }}">{{ movie.title }}</a> - {% for director in directors %} {{ director.first_name }} {{ director.surname }} {% endfor %} </li> {% endfor %} </ul> {% endfor %} unjoinify will even handle cartesian joins - for example, if a movie had multiple studios as well as multiple directors, you could join on both relations and have it successfully unpack to a tuple of (nomination, movie, directors, studios). You're advised to use this sparingly, though - this will result in count(directors) * count(studios) rows being returned for each movie, which, depending on your particular use-case, could end up being far worse than running separate queries... Explicit column names --------------------- Unfortunately, database engines are liable to have a fairly low limit on column name length (63 characters, in the case of Postgres), and it's easy to run into this limit when working with deeply-nested relations. To work around this, you can pass the list of column names as an additional parameter (in the same order that they appear in the query): awards = unjoinify(Award, sql, (festival_id,), columns = ['id', 'name', 'nomination__id', 'nomination__ranking', 'nomination__movie__id', 'nomination__movie__title']) Author ------ Matt Westcott <matt@west.co.tt> http://matt.west.co.tt/ - @westdotcodottt