elastic/elasticsearch-java

Offer support to determine the size of a BulkOperation

fabian-froehlich opened this issue · 1 comments

Description

Our application creates a Stream of objects that we write with an IndexRequest into ElasticSearch. The Stream is consumed in chunks.

The chunk size is determined by a few metrics that helped us stabilized the index Prozess. These are:

  • Number of Requests in BulkRequest
  • Accumulated Request-Size
    • Accumulated Body-Size (bulkRequest.estimatedSizeInBytes())
    • Accumulated Script Size (RamUsageEstimator.sizeOfMap(script.getParams()))

Is there already an utility inside elasticsearch-java that helps accommodate these requirements?

As far as I understand the new architecture correctly the serialization happens at the very end of the chain. But there might be a place where an estimation could happen? I would not like to serialize my objects twice. Once while building the BulkOperation-List and once while sending the request to ElasticSearch.

This is resolved by PR #474 that introduces a BinaryData type that can be used for bulk operation documents, and of which we can get the size in bytes.

This BinaryData can be either created directly, but it's also used transparently by the BulkIngester to evaluate the size of the bulk request.