ankane/active_median

Failed with Greatest-Per-Group Query

molfar opened this issue · 7 comments

I have a has-many association with DISTINCT ON select.

has_many :active_offers, -> {
   select("DISTINCT ON (region_id, brand_id) *")
   .order(:region_id, :brand_id, created_at: :desc)
  }, class_name: 'Offer'

And as a result, Brand.active_offers.median(:price) throws an error

PG::GroupingError: ERROR:  column "offers.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT DISTINCT ON (region_id, brand_id) *, PERCENTILE_CONT(...
                                                 ^

Generated query from logs:

SELECT DISTINCT ON (region_id, brand_id) *, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "offers"."price") AS median_price FROM "offers" WHERE "offers"."fuel_id" = 1

Hey @molfar, does it work if you replace median with average - Brand.active_offers.average(:price) (and can you share the logs for that as well)?

@ankane, It has no error with @brand.active_offers.average(:price) . But in this case, association scope with DISTINCT ON clause is lost. In logs, I see query:

   (0.9ms)  SELECT AVG("offers"."price") FROM "offers" WHERE "offers"."fuel_id" = $1  [["fuel_id", 1]]
  ↳ app/models/offer.rb:9:in `price'

@ankane Here is an example test to reproduce a failure

def test_association_with_distinct_on
    user = User.create!
    user.posts.create!(comments_count: 2)
    user.posts.create!(comments_count: 3)
    assert_equal 3, user.posts.select('DISTINCT ON (user_id) *').order(:user_id, created_at: :desc).median(:comments_count)
end

Awesome, thanks @molfar. The behavior should be consistent with average on master.

Here is a failing test molfar@85524a8

Active Median tries to be consistent with Active Record's average (both assertions are failing in the test case, not just the median one). They both replace the select clause, which works well for simple selects like select("id, name"). I'm not sure there's a way to combine SELECT DISTINCT ON (...) and SELECT AVG(...) without a subquery/CTE. I'd be curious to see what SQL query you're trying to generate.

@ankane you are definitely right. Be consistent with AR is the only true way. At the same time, you helped me a lot to rethink my code. Thank you for your job with this gem.