elastic/elasticsearch-java

Java client and direct CURL call do not return the same result

mbarbet opened this issue ยท 5 comments

Java API client version

8.13.3

Java version

17.0.11

Elasticsearch Version

8.14.1

Problem description

We do a GeoTile Grid aggregation with two average aggregations.
The curl call and the client java do not return the same json result for the buckets.
Below the curl version response:

{
   "key": "10/455/466",
   "doc_count": 26,
   "avg#avg:AOD565": {
       "value": 0.08212617407692308
   },
   "avg#avg:AOD566": {
       "value": null
   }
}

And the java client version response:

{
   "key": "10/455/466",
   "doc_count": 26,
   "avg#avg:AOD565": {
       "value": 0.08212617407692308
   },
   "avg#avg:AOD566": {
       "value": 0.0
   }
}

All the documents in the 10/455/466 grid have no value for AOD566 property.
With CURL the result of the average on this property is null and with the java client the result is 0.0.
There is a way to obtain the result of the curl with the java client ?

Hello!
The Java client defaults to zero when deserializing aggregation results because of a design choice that was made in order to avoid boxing numbers, which could increase memory usage substantially; we currently don't plan to change this. To check whether the result is actually zero or if the field is missing like in this case it's possible to use the other response fields (for example source).

Hello again!
For most aggregation responses where the java client defaults null values to 0.0, it's possible to disambiguate the correct value by checking the doc_count field. In this specific case it looks like this isn't possible, so we're trying to figure out a way to solve this issue without breaking the current implementation. Would a bool isEmpty() method in the aggregation response class work for you? So it would be possible to call it to understand whether the aggregation value of 0.0 is an actual 0 or it's actually null.

Hello!
Thank you for your feedback and your suggestion. What matters to us is to recover the empty information for each metric in the aggregation. It would be perfect if you added it!

Update: in the end we realized that it would just be easier to just box the numbers and change the field from the primitive type to the object, so in this case from double to the nullable Double, so to reflect more accurately what the server is sending at the cost of a memory usage increase which probably will only be noticeable for large aggregations. Unfortunately this is a breaking change, meaning it will have to wait for the next minor version, after the one that will come out in the following days, possibly in a couple of month. Sorry for the delay!

Bump on this^
We're observing similar issues for min, avg, and max aggregations where they're extending SingleMetricAggregateBase and are incorrectly deserializing null to 0.0 because of this

protected static <BuilderT extends AbstractBuilder<BuilderT>> void setupSingleMetricAggregateBaseDeserializer(
ObjectDeserializer<BuilderT> op) {
AggregateBase.setupAggregateBaseDeserializer(op);
op.add(AbstractBuilder::value, JsonpDeserializer.doubleOrNullDeserializer(0), "value");
op.add(AbstractBuilder::valueAsString, JsonpDeserializer.stringDeserializer(), "value_as_string");