dotnet/efcore

Cosmos: Support aggregate operators

smitpatel opened this issue · 9 comments

Average/Sum/Min/Max/Count

 typeof(Microsoft.Azure.Cosmos.QueryPlanHandler+QueryPlanHandlerException): Query contains 1 or more unsupported features. Upgrade your SDK to a version that does support the requested features:
    Query contained CompositeAggregate, which the calling client does not support. (Query contained CompositeAggregate, which the calling client does not support.)
    ---- Microsoft.Azure.Cosmos.QueryPlanHandler+QueryPlanHandlerException : Query contains 1 or more unsupported features. Upgrade your SDK to a version that does support the requested features:
    Query contained CompositeAggregate, which the calling client does not support. (Query contained CompositeAggregate, which the calling client does not support.)
    -------- System.ArgumentException : Query contained CompositeAggregate, which the calling client does not support.

I'm trying to use Cosmos DB and I need a count of all the entries to make paging work. Can you suggest a way around this problem that doesn't require me to ingest all the data?

@JonPSmith The best you can do for now is something like
query.Select(_ => 1).AsEnumerable().Count()

I'm sure you know but here is performance figures of the current way of using Count in EF Core (NoSQL database on Azure Cosmos DB Emulator, total number of entries = 50600, each count run three time to reduce first access setup time.

if using noSqlDbContext.Books.Select(_ => 1).AsEnumerable().Count() via EF Core

  1. Not timed.
  2. NoSQL count, EF Core select took 1,152.58 ms.
  3. NoSQL count, EF Core select took 1,201.86 ms.

If using SELECT VALUE Count(c) FROM c via cosmosClient

  1. Not timed.
  2. NoSQL count, via client took 121.09 ms.
  3. NoSQL count, via client took 100.90 ms.

If I'm doing something wrong then please let me know.

BTW Just noticed an interesting article on Cosmos DB with big improvements on COUNT, SUM, AVG, MAX, and MIN if you add a index.

I recommend this gets into the EF Core 6 list. Not having count is especially difficult (and Cosmos Count is super fast)

ACROV commented

I would really welcome aggregate support being added. I am wondering what is being proposed here though:

  • a single aggregate return value e.g. the result of SELECT COUNT(c) FROM c
  • multiple aggregate return values e.g. the result of SELECT MIN(c.value), MAX(c.Value) FROM c
  • full group by support e.g. the result of SELECT c.make, c.model, MIN(c.value), MAX(c.Value) FROM c GROUP BY c.make, c.model

This may be fully working.

roji commented

This is now supported, as of #33895 - though we're likely missing qiute a bit of test coverage.

For example:

await context.Set<OwnedPerson>().Select(o => o.Orders.Max(o => o.Id)).ToListAsync()

... now gets translated to:

SELECT (
    SELECT VALUE MAX(t["Id"])
    FROM t IN c["Orders"]) AS c
FROM root c
WHERE c["Discriminator"] IN ("OwnedPerson", "Branch", "LeafB", "LeafA")

Aggregate operators work also on the top-level entity types. GroupBy isn't yet supported, that's tracked by #17313.