salsify/goldiloader

Eager load arbitrary nesting associations (with STI)

viktorianer opened this issue · 11 comments

First of all thanks for a great gem.

I just found it and would like to use it in a project, where we heavily use includes. So, we need the feature to eager load arbitrary nesting of associations. But it looks like it is limited to what it can do for us, because it does not load all nesting associations. I'll try to explain my problem:

Some of our Models:

# STI models
class Person < ApplicationRecord
  has_many :persons_nationalities, as: :person, dependent: :destroy, inverse_of: :person
...
end

class Candidate < Person
...
end

# Relations
class PersonsNationality < ApplicationRecord
  belongs_to :person, polymorphic: true
  belongs_to :nationality, class_name: "Country"
end

class Country < ApplicationRecord
...
  has_many :persons_nationalities, foreign_key: :nationality_id
end

In the candidate views I render:

# app/views/candidates/_candidate.json.jbuilder
...
# ! does not load all nesting associations !
json.persons_nationalities object.persons_nationalities, partial: "persons_nationalities/persons_nationality", as: :object

# app/views/persons_nationalities/_persons_nationality.json.jbuilder
json.(object.nationality, :id, :adjective, :name) # <= Country

End in controller I do load:

@candidate = Candidate.find(params[:id])

When I load with @candidate = Candidate.includes(persons_nationalities: [:nationality]).find(params[:id]) it is working, but then I do not use this gem, right?

Does anyone have a suggestion or solution?! Thank you in advance.

Our Rails environment:

Docker ruby:2.7.3-slim-buster
rails 6.0.3.4
ruby 2.7.3p183

# Gemfile
gem "puma", "~> 5.0"
gem "pg", "~> 1.2"
gem "goldiloader", "~> 4.1"

On another model it does not load following nesting associations:

includes(candidate: [
  language_skills: [:language],
    candidates_projects: [
      project: [:company],
    ]
])

With this includes it is loading all items, without it does not load language and project: [:company].

Thanks for the @viktorianer question. Goldiloader should lazily eager load nested associations when they are first traversed. In your example I would expect to see the following eager loading happening:

# Step 1 - Load candidate
candidate = Candidate.find(params[:id])

# Step 2 - Traverse Candidate#persons_nationalities association
# which trigger eager loading the persons_nationalities association
# for all Candidates loaded in step 1 (which is only one)
persons_nationalities = candidate.persons_nationalities

# Step 3 - Traverse PersonsNationality#nationality association
# which trigger eager loading the nationality association
# for all PersonsNationalities loaded in step 2
persons_nationalities[0].nationality

Can you try running this in Rails Console and confirm in the ActiveRecord SQL logs that you're not eager loading in Step 3? Also you mentioned STI, does this problem only happen when the "root" model loaded in step 1 is an STI model?

I tried writing a test case in this gist but couldn't reproduce the problem. @viktorianer - Can you update that gist to demonstrate the problem?

Hey jturkel, thank you for your fast replay and help.

I just tried, what you suggested, and it looks like it does not work.

With includes it looks good:

candidate = Candidate.includes(persons_nationalities: [:nationality]).find 27086
  Candidate Load (1.3ms)  SELECT "candidates".* FROM "candidates" WHERE "candidates"."id" = 27086 LIMIT 1
  PersonsNationality Load (3.1ms)  SELECT "persons_nationalities".* FROM "persons_nationalities" WHERE "persons_nationalities"."person_type" = 'Candidate' AND "persons_nationalities"."person_id" = 27086
  Country Load (0.8ms)  SELECT "countries".* FROM "countries" WHERE "countries"."id" = 36
=> #<Candidate id: 27086, created_at: "...", updated_at: "...", ...

persons_nationalities[0].nationality
=> #<Country id: 36, name: "...", created_at: "...", updated_at: "...", adjective: "...", sort: 3>

Without includes not:

candidate = Candidate.find 27086
  Candidate Load (2.4ms)  SELECT "candidates".* FROM "candidates" WHERE "candidates"."id" = 27086 LIMIT 1
=> #<Candidate id: 27086, created_at: "...", updated_at: "...", ...

persons_nationalities = candidate.persons_nationalities
  PersonsNationality Load (3.0ms)  SELECT "persons_nationalities".* FROM "persons_nationalities" WHERE "persons_nationalities"."person_id" = 27086 AND "persons_nationalities"."person_type" = 'Candidate' LIMIT 11
=> #<ActiveRecord::Associations::CollectionProxy [#<PersonsNationality id: 18654, nationality_id: 36, person_type: "Candidate", person_id: 27086>]>

persons_nationalities[0].nationality
  PersonsNationality Load (2.0ms)  SELECT "persons_nationalities".* FROM "persons_nationalities" WHERE "persons_nationalities"."person_id" = 27086 AND "persons_nationalities"."person_type" = 'Candidate'
  Country Load (1.2ms)  SELECT "countries".* FROM "countries" WHERE "countries"."id" = 36 LIMIT 1
=> #<Country id: 36, name: "...", created_at: "..", updated_at: "..", adjective: "...", sort: 3>

I will check your gist and cam back. Thank you!

I tried writing a test case in this gist but couldn't reproduce the problem. @viktorianer - Can you update that gist to demonstrate the problem?

Gist looks good to me.

Side note: in our Person model, we also have the relation nationalities:

# STI models
class Person < ApplicationRecord
...
  has_many :nationalities, through: :persons_nationalities
...
end

And if I load this, it loads as expected the countries through persons_nationalities:

candidate = Candidate.find 27086
  Candidate Load (1.4ms)  SELECT "candidates".* FROM "candidates" WHERE "candidates"."id" = 27086 LIMIT 1
=> #<Candidate id: 27086, created_at: "", updated_at: "", ...

persons_nationalities = candidate.nationalities
  Country Load (3.2ms)  SELECT "countries".* FROM "countries" INNER JOIN "persons_nationalities" ON "countries"."id" = "persons_nationalities"."nationality_id" WHERE "persons_nationalities"."person_id" = 27086 AND "persons_nationalities"."person_type" = 'Candidate' LIMIT 11
=> #<ActiveRecord::Associations::CollectionProxy [#<Country id: 36, name: "...", created_at: "...", updated_at: "...", adj...

But this is a different case, with "shortcuts" through nested has_many associations, right?
https://guides.rubyonrails.org/association_basics.html#the-has-many-through-association

@viktorianer - A few interesting things jump out to me from these logs:

  1. There seems to be a candidates table and the SQL queries have a "persons_nationalities"."person_type" = 'Candidate' filter. Are you using STI for the Person and Candidate model (in which case there should be a people table not a candidates table) or is the Person model an abstract base class (i.e. sets self.abstract_class = true)?
  2. The Goldiloader queries for the PersonsNationality#persons_nationalities association have a LIMIT 11 but I'm not sure where that LIMIT is coming from. Does the persons_nationalities association have a customized query scope that adds this limit?

Hey @jturkel, thank you for looking at this issue.

  1. Yes, Person is self.abstract_class = true
  2. No, persons_nationalities association is without any scopes. The LIMIT is normal AR/Rails Feature since Rails 5.1 rails/rails#28592

@viktorianer - Apologies. I missed your reply to this issue. I created a new gist with an abstract Person model but I still can't reproduce the issue. Are you sure the test case's database schema and ActiveRecord models correctly match your setup?

Looking at the SQL logs in this comment, the given candidate only seems to have one nationality so this isn't a great test for eager loading. Can you try to reproduce this on a candidate with multiple nationalities or multiple candidates with separate nationalities? The LIMIT introduced by the ActiveRecord::Relation#inspect call might be screwing things up a bit so you might have add a to_a call to simulate what would happen in your view code:

# Step 1 - Load candidate. Alternatively load multiple candidates Candidate.where(id: ids).to_a[0]
candidate = Candidate.find(params[:id])

# Step 2 - Traverse Candidate#persons_nationalities association
# which trigger eager loading the persons_nationalities association
# for all Candidates loaded in step 1 (which is only one)
persons_nationalities = candidate.persons_nationalities.to_a

# Step 3 - Traverse PersonsNationality#nationality association
# which trigger eager loading the nationality association
# for all PersonsNationalities loaded in step 2
persons_nationalities[0].nationality

OK, looks like .to_a "fix" it. But Candidate.includes(persons_nationalities: [:nationality]) is just working out of the box. From my site, we can close this issue then.

The to_a is really just an artifact of testing this out on Rails console which call inspect to print the returned result. It's not something you'll need in your production app. A more realistic snippet you can run from Rails console to show the lazy eager loading in action is:

Candidate.where(id: ids).each do |candidate|  
  candidate.persons_nationalities.each do |persons_nationality|
    puts "Candidate #{candidate.inspect} has nationality #{persons_nationality.nationality}"
  end 
end

This snippet should run three database queries: one to load the candidates, one to load all of the person nationalities for the loaded candidates, and one to load all of the nationalities for the loaded person nationalities. The SQL run should be the same as Candidate.includes(persons_nationalities: [:nationality]) but it will run slightly later when the Ruby code first traverses an association.

The to_a is really just an artifact

I know.

more realistic snippet

I can not check it, because I do not work on this project anymore. But I think, now I have an idea, why it is different to "Rails default" console output. But also have yet no time to check it. Instead, I checked it with your gist and also added a second call to it. And it looks it is working correct.

...
# found_candidate = Candidate.find(candidate.id)
D, [2022-07-15T10:22:16.131285 #68626] DEBUG -- :   Candidate Load (0.1ms)  SELECT "candidates".* FROM "candidates" WHERE "candidates"."id" = ? LIMIT ?  [["id", 3], ["LIMIT", 1]]

# found_candidate.persons_nationalities[0].nationality
D, [2022-07-15T10:22:16.132335 #68626] DEBUG -- :   PersonsNationality Load (0.1ms)  SELECT "persons_nationalities".* FROM "persons_nationalities" WHERE "persons_nationalities"."person_id" = ? AND "persons_nationalities"."person_type" = ?  [["person_id", 3], ["person_type", "Candidate"]]
D, [2022-07-15T10:22:16.133159 #68626] DEBUG -- :   Country Load (0.1ms)  SELECT "countries".* FROM "countries" WHERE "countries"."id" IN (?, ?)  [["id", 4], ["id", 5]]

# first run
# found_candidate.persons_nationalities.each do ...

# second run
# found_candidate.persons_nationalities.each do ...

As I said, I am no longer working on this project. But I can remember that https://github.com/charkost/prosopite auto-detect N+1 queries. That's why I raised this issue in the first place.