dotnet/efcore

Query: Join after GroupByAggregate throws when join key uses grouping key or aggregate function

smitpatel opened this issue ยท 22 comments

something like

var query = context.Orders.GroupBy(o => o.CustomerID).Select(g => new { g.Key, Count = g.Count() })
.Join(context.Customers, o => o.Key, c => c.CustomerID, (o, c) => new {C = c, o.Count});
[ConditionalFact]
        public virtual void GroupBy_Aggregate_Join()
        {
            AssertQuery<Order, Customer>(
                (os, cs) =>
                    from a in os.GroupBy(o => o.CustomerID)
                                .Where(g => g.Count() > 5)
                                .Select(g => new { CustomerID = g.Key, LastOrderID = g.Max(o => o.OrderID) })
                    join c in cs on a.CustomerID equals c.CustomerID
                    join o in os on a.LastOrderID equals o.OrderID
                    select new { c, o });
        }
Compiling query model: 
  'from IGrouping<string, Order> g in 
      (from Order o in DbSet<Order>
      select [o]).GroupBy([o].CustomerID, [o])
  where 
      (from Order <generated>_1 in [g]
      select [<generated>_1]).Count() > 5
  join Customer c in DbSet<Customer>
  on [g].Key equals [c].CustomerID
  join Order o in DbSet<Order>
  on 
      (from Order o in [g]
      select [o].OrderID).Max() equals [o].OrderID
  select new <>f__AnonymousType14<Customer, Order>(
      [c], 
      [o]
  )'

Few issues here:

  • Since Key comes from anonymous binding from SelectExpression. It is alias expression which may need to unwrapped.
  • When the join key is aggregate function, the initial client eval does not use SqlTranslatingExpressionVisitor, causing issue with grouping parameter being non-sequence type.
  • Even after above is fixed, the QueryModelVisitor generated for subquery lacks parent QMV hence we are unable to find SelectExpression and fail to translate the key properly.

Moving this to backlog. We will be doing client eval (streaming groupby) whenever we the GroupByAggregate query is used in conjunction with join.

#11700 relaxes conditions here and allow translation to server if the GroupByAggregateQuery is coming from join.

        [ConditionalFact]
        public virtual void GroupBy_Aggregate_Join_inverse2()
        {
            AssertQuery<Order, Customer>(
                (os, cs) =>
                    from c in cs
                    join a in os.GroupBy(o => o.CustomerID)
                                .Where(g => g.Count() > 5)
                                .Select(g => new { CustomerID = g.Key, LastOrderID = g.Max(o => o.OrderID) })
                        on c.CustomerID equals a.CustomerID
                    select new { c, a.LastOrderID },
                entryCount: 63);
        }

This query successfully translates to server producing

SELECT [c].[CustomerID], [c].[Address], [c].[City], [c].[CompanyName], [c].[ContactName], [c].[ContactTitle], [c].[Country], [c].[Fax], [c].[Phone], [c].[PostalCode], [c].[Region], [t].[CustomerID], [t].[LastOrderID]
FROM [Customers] AS [c]
INNER JOIN (
    SELECT [o].[CustomerID], MAX([o].[OrderID]) AS [LastOrderID]
    FROM [Orders] AS [o]
    GROUP BY [o].[CustomerID]
    HAVING COUNT(*) > 5
) AS [t] ON [c].[CustomerID] = [t].[CustomerID]

Note: More than one join doesn't work well due to #10812

#12826 would enable this scenario fully.

This has been an issue for nearly a year. Is it possible to get a fix before 3.0.0?

do you have any date for a fix, please ?

@reservoir-dogs We hope to fix this issue in the 3.0 release, which will be later this year.

zloom commented

Hi, have same issue on 2.2.4
var test = data .Where(x => x.timestamp <= new DateTime(2019, 07, 01, 16, 24, 04)) .GroupBy(x => new { x.easting, x.northing }, x => x.timestamp) .Select(x => new { x.Key.easting, x.Key.northing, timestamp = x.Max() }) .Join(data, x => x, x => new { x.easting, x.northing, x.timestamp }, (_, x) => x) .ToList();

I see you were hoping on getting this in 3.0. Looks like it's still an issue. Any update on this? Is this punted for 3.1 now?

@mattheiler - While this issue is not marked as fixed, many of this scenarios could already be working. We have done work to improve implementation to make such queries translate though we did not have time to go through various queries to find edge case bugs. You can try running your query on 3.0 (or even better 3.1 preview) and it may be working already. If it is not then there is always easy work-around as mentioned here #10012 (comment) which is to apply key/aggregate operation in custom projection before composing join. And that already works.
We would love to know what queries are still not working in 3.x release so that we can fix them in future release.

@smitpatel I'll put together an example to reproduce. Thanks!

    public class Foo
    {
        public int Id { get; set; }

        public ICollection<FooBar> Bars { get; } = new List<FooBar>();
    }

    public class FooBar
    {
        public virtual Foo Foo { get; set; }

        public virtual int FooId { get; set; }

        public virtual Bar Bar { get; set; }

        public virtual int BarId { get; set; }
    }

    public class Bar
    {
        public virtual int Id { get; set; }

        public virtual ICollection<FooBar> Foos { get; } = new List<FooBar>();

        public virtual ICollection<BarBaz> Bazs { get; } = new List<BarBaz>();
    }

    public class BarBaz
    {
        public virtual Bar Bar { get; set; }

        public virtual int BarId { get; set; }

        public virtual Baz Baz { get; set; }

        public virtual int BazId { get; set; }
    }

    public class Baz
    {
        public virtual int Id { get; set; }

        public virtual ICollection<BarBaz> Bars { get; } = new List<BarBaz>();
    }

    public class BazResult
    {
        public Baz Baz { get; set; }

        public int BazBarsCount { get; set; }
    }

    public class EfTestDbContext : DbContext
    {
        public virtual DbSet<Foo> Foos { get; private set; }

        public virtual DbSet<Bar> Bars { get; private set; }

        public virtual DbSet<Baz> Bazs { get; private set; }

        protected override void OnConfiguring(DbContextOptionsBuilder options)
        {
            options.UseSqlServer("Data Source=localhost; Initial Catalog=EfTest; Integrated Security=true; MultipleActiveResultSets=True;");
        }

        protected override void OnModelCreating(ModelBuilder model)
        {
            model.Entity<FooBar>().HasKey(e => new {e.FooId, e.BarId});
            model.Entity<BarBaz>().HasKey(e => new {e.BarId, e.BazId});
        }
    }

    public class Program
    {
        public static async Task Main(string[] args)
        {
            await using var context = new EfTestDbContext();

            const int fooId = 0;
                
            var barBazsByFoo =
                from foo in context.Foos
                where foo.Id == fooId
                from fooBar in foo.Bars
                let bar = fooBar.Bar
                from barBaz in bar.Bazs
                select barBaz;

            var barBazs =
                from barBaz in barBazsByFoo
                group barBaz by barBaz.BazId
                into barBazGroups
                join barBaz in barBazsByFoo on barBazGroups.Key equals barBaz.BazId
                select new BazResult
                {
                    Baz = barBaz.Baz,
                    BazBarsCount = barBazGroups.Count()
                };

            await barBazs.ToListAsync();
        }
    }

Processing of the LINQ expression 'GroupByShaperExpression:
KeySelector: b.BazId,
ElementSelector:EntityShaperExpression:
EntityType: BarBaz
ValueBufferExpression:
ProjectionBindingExpression: Outer
IsNullable: False
' by 'RelationalProjectionBindingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core. See https://go.microsoft.com/fwlink/?linkid=2101433 for more detailed information.

This is with EF Core SqlServer 3.0.0.

I upgraded to 3.1.0-preview3.19554.8 and the issue persists.

@mattheiler - It happens since you are selecting navigation off the entity you are joining. Hence it won't work for now.

Hence it won't work for now.

So this scenario is a known issue? Alrighty.

Same with EF 3.1 on a fairly simple query: hidegh/Meal-tracker@a85ac8e

It's just EF core issue, same approach works within EF6.x and well, I'd bet it works on NHibernate (even older versions) too.

Just wondering if this issue is going to be addressed and when. At the moment, it is blocking us from upgrading from dotnet core 2.2 to 3.1 and I looking at the thread, I suspect others are in the same boat.

Yes, regarding odata - it works fine with efcore 2.2, but breaks with efcore 3+. I'm not sure if it is the same bug as discussed in this issue since this issue was originally entered before 2.2 even came out and odata works fine with 2.2 - just not 3+. It would be nice if someone from the efcore team could take a look and at least classify and/or comment on the problem with odata. For an official (and useful) microsoft project, odata doesn't get much thought from the other teams - for months aspnetcore changes were preventing people using odata migrating to .net core 3+ - they finally fix those - only to find out efcore changes are breaking it now at least for people who need group by.

@MolallaComm We plan to sync with the OData team to get a better understanding of some of the queries that are generated.

Imho this issue should be handled with pressure and not being open for several years. Let's have an example.
You have 1.000.000 customers. Every customer has some orders with a date. And now you need the 50 customers (paging) with the newest orders. In SQL thats something like (without TOP 50):

SELECT C.*, O.DATE
FROM CUSTOMER C
JOIN (
  SELECT CUSTOMER_ID, MAX(DATE) DATE
  FROM ORDERS
  GROUP BY CUSTOMER_ID) O ON C.CUSTOMER_ID = O.CUSTOMER_ID
ORDER BY O.DATE DESC

This I currently impossible with EF Core. Reading all 1.000.000 customers and select 50 of them is not a solution. The only solution I found is to write raw SQL.

All the queries described in this issue were already working. They may have started working in 3.1 or in earlier previews of 5.0.