dotnet/efcore

Translate GroupBy followed by FirstOrDefault over group

smitpatel opened this issue ยท 30 comments

Translate GroupBy followed by FirstOrDefault over group

Is there any update? Any alternative or planned fix? @ajcvickers @shanselman

Is there a workaround for this that would allow for database-evaluated DistinctBy() logic? Seems like this Issue blocks database evaluation of this.

Any workaround while it's being fixed? I am in the process of migrating .NET Core from 2.1 to 3.1 and EF Core from 2.1 to 3.1.

Another breaking change for migrating to ef core 3.1 and nobody knows when it will be supported :( @smitpatel @ajcvickers

@suadev - It was getting client eval'ed before. It is part of same breaking change disabling client eval. The perf was bad for this case. If your group had 100 element each and you are selecting only first, we still got data for all 100 elements to client side to compute the first.

Very important feature IMHO.
My use case is this:

await _db.RetailerRates
                .GroupBy(r => r.Price)
                .Select(g => g.OrderByDescending(r => r.CreatedAt).First())
                .ToListAsync();
GSPP commented

This query pattern shows up quite a bit in my under-way 1000 table LINQ to SQL migration project.

Example use case: You have a list of blogs. For each blog you want to display a single highlighted post (the most upvoted or most recent one). So you might group by blog and take the first item from each group.

This can be translated with ROW_NUMBER ... WHERE r = 1 or with CROSS APPLY (SELECT TOP 1 ...). Both variants have different query plan shapes and performance characteristics. It would be nice if the shape could be selected by writing the LINQ query in a particular way.

This query could result in the ROW_NUMBER shape:

ctx.BlogPosts
.GroupBy(bp => bp.Blog)
.Select(g => g.OrderByDescending(bp => bp.CreatedAt).FirstOrDefault())
.ToList();

This query could result in the CROSS APPLY shape:

ctx.BlogPosts
.GroupBy(bp => bp.Blog)
.SelectMany(g => g.OrderByDescending(bp => bp.CreatedAt).Take(1).DefaultIfEmpty())
.ToList();

But any way to support this would be better than no way so that our migration can proceed.


The ROW_NUMBER plan leads SQL Server to sort all rows which could be billions. The CROSS APPLY lends itself to a loop join plan that very selectively queries an index. ROW_NUMBER is great if the fraction of rows selected is near 100%. CROSS APPLY is great when picking very few items.

Depending on the scenario the performance difference can be significant. SQL Server is unable to automatically switch between theses forms.

@smitpatel @ajcvickers Is there any update on this issue or work around you could suggest for my case #21619 please?

If I distil my use-case it comes down to a simple multi-column group-by statement and a projection:

            var groupedOrganisationMentions = await collection
                .GroupBy(m => new {m.LocationId, m.PublicationId, m.PracticeAreaId})
                .Select(am => new
                {
                    Key = am.Key,
                    Items = am.ToList()
                }).ToListAsync(cancellationToken: token).ConfigureAwait(false);

The Items projection just doesn't want to work.

I've got a similar use case that seems to be related to this issue. I use code like this in several places:

// This works fine
query.Select(price => new Customer {
   Name = price.Payer.Name,
   Code = price.Payer.Code,
   City = price.Payer.City,
   ParentCode = price.Payer.ParentCode,
   ParentLevel = CustomerLevel.Corporate,
   CustomerLevel = CustomerLevel.Payer
}).Distinct().ToListAsync();

That works fine. The query variable could potentially have a variety of expressions and joins with no issue. As soon as I add a call to OrderBy, it will not evaluate. I've tried several workarounds that I've found around the interwebz, and nothing seems to resolve it.

// This throws error
// query is of type IQueryable<Price>
query.Select(price => new Customer {
   Name = price.Payer.Name,
   Code = price.Payer.Code,
   City = price.Payer.City,
   ParentCode = price.Payer.ParentCode,
   ParentLevel = CustomerLevel.Corporate,
   CustomerLevel = CustomerLevel.Payer
}).Distinct().OrderBy(cust => cust.Name).ToListAsync();

Also, placement of the OrderBy does not seem to matter.

// This also throws error
// query is of type IQueryable<Price>
query
   .OrderBy(price => price.payer.Name)
   .Select(price => new Customer {
      Name = price.Payer.Name,
      Code = price.Payer.Code,
      City = price.Payer.City,
      ParentCode = price.Payer.ParentCode,
      ParentLevel = CustomerLevel.Corporate,
      CustomerLevel = CustomerLevel.Payer
   }).Distinct().ToListAsync();

@lenniebriscoe - The query you posted above is different from what you posted on #21619 Your query here does not contain First/FirstOrDefault over grouping element. Please refer to #19929 and #17068 to understand different queries and their translation to relational database.

@brandonsmith86 - Your query does neither contains a GroupBy operator nor FirstOrDefault over the grouping element and is irrelevant to this discussion. Please file a new issue with full repro code if you believe there is a bug in EF Core.

@brandonsmith86 - Your query does neither contains a GroupBy operator nor FirstOrDefault over the grouping element and is irrelevant to this discussion. Please file a new issue with full repro code if you believe there is a bug in EF Core.

Sorry about that. Many of the examples in this issue do not have reference to FirstOrDefault, and I assumed a call to Distinct uses grouping logic, same as GroupBy method. I will post elsewhere.

@smitpatel
Any news about this feature? (GroupBy followed by FirstOrDefault over group)
Thanks for your reply.

Okey you can achive this with using this approach:
https://dev.azure.com/pawelgerr/Thinktecture.EntityFrameworkCore/_wiki/wikis/Thinktecture.EntityFrameworkCore.wiki/35/RowNumber-Support-(EF-Core-2)

DbContext.OrderItems.Select(o => new 
                     { 
                        RowNumber = EF.Functions.RowNumber(o.ProductId)
                     })
                     .AsSubQuery()
                     .Where(i => i.RowNumber == 1)

Would be nice if this gets built in.

Hi guys looking for resolution or workaround for why this can't be translated in EF Core

.GroupBy()
.Select(group=>group.First())

Really, some basic which NHibernate was able to handle in 2006 (and also EF6.x was OK with it) is still not in EF core? Not after 2 years?

e4c6 commented

This really has to be supported in EF core.

This is a very common pattern in my EF queries, I'm shocked it's not in EF Core. Effectively preventing me from migrating to .Net (core). Must have

@stevozilik EF6 runs on .NET Core, so even if you can't use EF Core due to this, it should not stop you migrating to .NET Core.

I'd also like to add that it is utterly abysmal that EF Core cannot support a select with a group by.

@smitpatel

@suadev - It was getting client eval'ed before. It is part of same breaking change disabling client eval. The perf was bad for this case. If your group had 100 element each and you are selecting only first, we still got data for all 100 elements to client side to compute the first.

I don't think you can assume that - I believe this worked fine in EF 6 / LINQ to SQL so I wouldn't assume was using EF Core 2.x (I would never assume that - it was barely functional.)

@ajcvickers

@stevozilik EF6 runs on .NET Core, so even if you can't use EF Core due to this, it should not stop you migrating to .NET Core.

Will EF 6 be supported on .Net 6?

@NetMage - The comment from suadev indicated that they ran into another breaking change when upgrading to EF Core 3.1. So yes it is fair to assume it is coming from earlier version of EF Core. When upgrading from EF6/LINQ to SQL, the concept of running into breaking change has no meaning. Changing from one product to another product will require code changes and there are no "breaking changes" in such change.

Wondering what all of you have been using as a workaround up until this is fixed? Last time i got away with it by caching all the results and doing to query in the app instead of linq-to-sql. This time I really have to do it like @reathh shows.

So if anyone got a workaround, that would be great.

Very important feature IMHO.
My use case is this:

await _db.RetailerRates
                .GroupBy(r => r.Price)
                .Select(g => g.OrderByDescending(r => r.CreatedAt).First())
                .ToListAsync();

I got away with something like this before:

await _db.RetailerRates
  .GroupBy(x => x.Price)
  .Select(x => new { CreatedAt = x.Max(y => y.CreatedAt), Price = x.Key})
  .Join(_db.RetailerRates, x => x, x => new { x.CreatedAt, x.Price }, (_, retailerRate) => retailerRate);
  .ToListAsync();

Disclaimer: This solution depends on the uniqueness of your Price/CreatedAt combination. If not unique, then you might end up with multiple entities for each Price/CreatedAt combination. You could do some extra filtering after materializing the query to accomodate for this ๐Ÿ™‚

@Ruud-cb Try to use the codes below,

await _db.RetailerRates
                .Select(r => r.Price)
                .Distinct()
                .SelectMany(p => _db.RetailerRates.Where(r => r.Price == p).OrderByDescending(r => r.CreatedAt).Take(1))
                .ToListAsync()

For SQL Server, it will generate the sql with partition & inner join. The result is the same as group by.

@Ruud-cb Try to use the codes below,

await _db.RetailerRates
                .Select(r => r.Price)
                .Distinct()
                .SelectMany(p => _db.RetailerRates.Where(r => r.Price == p).OrderByDescending(r => r.CreatedAt).Take(1))
                .ToListAsync()

For SQL Server, it will generate the sql with partition & inner join. The result is the same as group by.

I think .Distinct() will materialize all distinct prices?

@DK8ALREC Yes, it works as same as group by. And for linq-to-sql, it will be generated as select distinct(r.Price) from [RetailerRates] as [r].

@DK8ALREC Yes, it works as same as group by. And for linq-to-sql, it will be generated as select distinct(r.Price) from [RetailerRates] as [r].

Yes, my point was that I think your approach would make 2 database queries, and it will materialize all distinct Prices before making the "final query", which could potentially be a lot of rows I guess.

Yes, my point was that I think your approach would make 2 database queries, and it will materialize all distinct Prices before making the "final query", which could potentially be a lot of rows I guess.

No, it won't. You just need to take a try :)