mojotech/scrivener_ecto

Cautionary note about dropping preloads - distinct required?

Closed this issue · 3 comments

Thanks for the great library. I was recently building a query and noticed a small quirk.

query = Trip
        |> join(:left, [t], d in assoc(t, :destinations))
        |> join(:left, [t, _d], c in assoc(t, :companies))
        |> join(:left, [t, _d, _c], u in assoc(t, :travellers))
        |> where([_t, _d, c, _u], c.id == ^company.id)
        |> order_by([t, _d, _c, _u], desc: t.start)
        |> preload([_t, d, _c, u], destinations: d, travellers: u)

query
==> #Ecto.Query<from t0 in Radar.Trips.Trip,
 left_join: d in assoc(t0, :destinations),
 left_join: c in assoc(t0, :companies), 
 left_join: t1 in assoc(t0, :travellers),
 where: c.id == ^"91955970-34eb-4a1b-97e6-53e9084abbfa",
 order_by: [desc: t0.start], preload: [travellers: t1, destinations: d]>

query |> Repo.all() |> length()
==> 3424

# to mimic the count query
query |> exclude(:preload) |> exclude(:order_by) |> Repo.all() |> length()
==> 4556

Notice that by excluding the preloads, the count jumps. This is likely because the joins are kept, but the removal of preloads results in duplicate rows during the count query.

Adding distinct removes the issue.

query = Trip
        |> distinct(true)
        |> join(:left, [t], d in assoc(t, :destinations))
        |> join(:left, [t, _d], c in assoc(t, :companies))
        |> join(:left, [t, _d, _c], u in assoc(t, :travellers))
        |> where([_t, _d, c, _u], c.id == ^company.id)
        |> order_by([t, _d, _c, _u], desc: t.start)
        |> preload([_t, d, _c, u], destinations: d, travellers: u)

query |> Repo.all() |> length()
==> 3424

query |> exclude(:preload) |> exclude(:order_by) |> Repo.all() |> length()
==> 3424

Should a small note be made in the readme about it?

Thanks for the detailed explanation. I agree on adding a note in the readme. Would you like to take a pass at it in a PR? If so I'd be happy to merge it.

Gladly. However, our specific use case might be a bit more involved and is more related to proper pagination when joins are involved so that repeated rows are avoided while maintaining the true count.

This seemed to address something similar:
mojotech/scrivener#30

Did it ever make it into the code base here?

The SQL we need to execute is something like the following:

SELECT t.*, d.*, u.* FROM 
(
    SELECT x.* FROM trips AS x
    INNER JOIN trips_companies AS tc ON tc.trip_id = x.id
    INNER JOIN companies AS c ON tc.company_id = c.id    
    WHERE c.id = '91955970-34eb-4a1b-97e6-53e9084abbfa'
    ORDER BY x.start DESC
    LIMIT 100 
    OFFSET 0
) AS t
INNER JOIN destinations AS d ON d.trip_id = t.id
LEFT JOIN trips_travellers AS tt ON tt.trip_id = t.id
LEFT JOIN users AS u ON tt.user_id = u.id
ORDER BY t.start DESC

Any suggestions?

I'll attempt to implement the solution proposed from that old pull request unless a better approach is recommended.