elastic/elasticsearch-net

Composite Aggregation with multiple sources breaks query structure

Closed this issue · 2 comments

Elastic.Clients.Elasticsearch version: 9.1.7

Elasticsearch version: 9.1.2

.NET runtime version: .NET Framework v4.8

Operating system version: Windows 11


Description of the problem including expected versus actual behavior:

When using the Fluent API to define a composite aggregation with more than one field, the client generates invalid JSON.

Instead of serializing each entry in the sources array as a separate object, the Fluent API combines them into a single object, which is not supported by Elasticsearch and causes the query to be rejected.


Steps to reproduce:

  1. Use the Fluent API to define a composite aggregation with 2 .Add() calls in .Sources(...). Example (Fluent Syntax Used):
var searchResponse = client.Client.SearchAsync<DBODespesa>(s => s
    .Index(nomeIndice)
    .Size(0)
    .Query(finalQuery)
    // Here
    .Aggregations(aggregations => aggregations
        .Add("group_by", aggregation => aggregation
            .Composite(composite => composite
                            .Size(65536)
                            .Sources(sources => sources
                                .Add("descricao", src => src
                                    .Terms(terms => terms.Field(campoDescricao))
                                )                                                
                                .Add("codigo", src2 => src2
                                    .Terms(terms => terms.Field(campoChave))
                                )
                            )
                        )
            .Aggregations(aggregations2 => aggregations2
                .Add("soma_empenhado", aggregation1 => aggregation1
                    .Sum(sum => sum
                        .Field(x => x.ValorEmpenho)
                    )
                )
                .Add("soma_liquidado", aggregation1 => aggregation1
                    .Sum(sum => sum
                        .Field(x => x.ValorLiquidado)
                    )
                )
                .Add("soma_pago", aggregation1 => aggregation1
                    .Sum(sum => sum
                        .Field(x => x.ValorPago)
                    )
                )
                .Add("soma_rap", aggregation1 => aggregation1
                    .Sum(sum => sum
                        .Field(x => x.ValorRap)
                    )
                )
                .Add("primeiro_registro", aggregation1 => aggregation1
                    .TopHits(top_hits => top_hits
                        .Size(1)
                    )
                )
            )
        )
    )
).Result;
  1. Run the query using .SearchAsync(...)
  2. The query fails due to malformed JSON in the sources array

Expected behavior

"sources": [
  { "descricao": { "terms": { "field": "unidadeGestora.keyword" } } },
  { "codigo": { "terms": { "field": "codigoUnidadeGestora" } } }
]

Actual behavior

"sources": [
  {
    "descricao": { "terms": { "field": "unidadeGestora.keyword" } },
    "codigo": { "terms": { "field": "codigoUnidadeGestora" } }
  }
]

This breaks composite aggregations with more than one field and prevents pagination/sorting on the server side.


Provide ConnectionSettings (if relevant): We're using the official Elastic.Clients.Elasticsearch client with default ElasticsearchClientSettings, strongly typed models, and fluent API approach.


Provide DebugInformation (if relevant):
Query returns error:
ElasticsearchClientException: Request failed to execute. The server returned 400 - Bad Request. Invalid composite aggregation: sources must be an array of single-key objects.


Extra Context

  • There’s no public workaround using the Fluent API.
  • This issue blocks real-world use of composite aggregations with multi-field grouping.
  • This was confirmed by Elastic Support as potentially being a client-side serialization bug.

Support case ID (if helpful): 01980458


Requested Help:

  • Confirm whether this is a bug
  • Provide a Fluent API example that works with multiple sources
  • Suggest workaround if bug is confirmed

Thanks in advance!

Hi @ianschmaltz ,

this is a rare place where the generated .NET syntax unfortunately is quite ambiguous. Here is an explanation and an example of the correct syntax:

// Sources is defined as:
// ICollection<IDictionary<string, Elastic.Clients.Elasticsearch.Aggregations.CompositeAggregationSource>>

var descriptor = new SearchRequestDescriptor<Person>()
    .Aggregations(aggs => aggs
        .Add("group_by", agg => agg
            .Composite(comp => comp
                .Sources( // <- this is a 'params' array like for all "List"-type properties and allows to set multiple items for the outer `ICollection`.
                    source => source.Add("descricao", x => x.Terms(x => x.Field(x => x.FirstName))), // <- `Add` refers to the inner dictionary (which only accepts a single key)
                    source => source.Add("codigo", x => x.Terms(x => x.Field(x => x.LastName)))      // <- `Add` refers to the inner dictionary (which only accepts a single key)
                )
            )
        )
    );

It seems like the root cause of the weird syntax is a wrong type for the Sources property in the upstream Elasticsearch specification. I'm currently working on improving this. In the meantime, using the above syntax should unblock you.

Hi @flobernd,

Thank you for the detailed explanation and example, that clarified a lot. We'll be testing this syntax in our environment and get back to you with feedback as soon as possible.

Appreciate the support and attention!