elastic/rally

Large meta.error-description field fails Elasticsearch metric store ingest

gbanasiak opened this issue · 0 comments

Rally version (get with esrally --version):

esrally 2.10.0.dev0 (git revision: a2c09a751d7e2797fde531f90aea43ac1375c987)

Description of the problem including expected versus actual behavior:

In certain scenarios, Rally can produce large meta.error-description field in rally-metrics-* documents which cannot be indexed by Elasticsearch and fails a race. The meta.error-description field is mapped as keyword which has a term byte-length limit of 32766 bytes imposed by Lucene.

The characteristic symptom is the following error:

Document contains at least one immense term in field="meta.error-description" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.

Provide logs (if relevant):

2024-02-27 20:37:56,153 ActorAddr-(T|:37759)/PID:484 esrally.driver.runner WARNING Bulk request failed: [HTTP status: 409, message: [-ChI7I0BiZHehfRA873C]: version conflict, document already exists (current version [1]) | HTTP status: 409, message: [-ChI7I0BiZHehfRA877C]: version conflict, document already exists (current version [1]) | HTTP status: 409, message: [-ChI7I0BiZHehfRA87_C]: version conflict, document already exists (current version [1]) | HTTP status: 409, message: [-ChI7I0BiZHehfRA87nB]: version conflict, document already exists (current version [1]) | [..]

024-02-27 20:38:24,838 ActorAddr-(T|:33739)/PID:32157 esrally.metrics ERROR Unretryable error encountered when sending metrics to remote metrics store: [document_parsing_exception] - Full error(s) [[{'index': {'_index': 'rally-metrics-2024-02', '_id': '1BZK7I0BHqD26mvHOsiZ', 'status': 400, 'error': {'type': 'document_parsing_exception', 'reason': "[1:1166] failed to parse field [meta.error-description] of type [keyword] in document with id '1BZK7I0BHqD26mvHOsiZ'. Preview of field's value: 'HTTP status: 409, message: [-ChI7I0BiZHehfRA873C]: version conflict, document already exists (current version [1]) | HTTP status: 409, message: [-ChI7I0BiZHehfRA877C]: version conflict, document already exists (current version [1]) | HTTP status: 409, message: [-ChI7I0BiZHehfRA87_C]: version conflict, document already exists (current version [1]) | HTTP status: 409, message: [-ChI7I0BiZHehfRA87nB]: version conflict, document already exists (current version [1]) | ... ", 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Document contains at least one immense term in field="meta.error-description" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: \'[72, 84, 84, 80, 32, 115, 116, 97, 116, 117, 115, 58, 32, 52, 48, 57, 44, 32, 109, 101, 115, 115, 97, 103, 101, 58, 32, 91, 45, 67]...\''}}, [..]

2024-02-27 20:38:25,742 -not-actor-/PID:32059 esrally.racecontrol ERROR A benchmark failure has occurred
2024-02-27 20:38:25,742 -not-actor-/PID:32059 esrally.racecontrol INFO Telling benchmark actor to exit.
2024-02-27 20:38:25,743 -not-actor-/PID:32059 esrally.rally ERROR Cannot run subcommand [race].
Traceback (most recent call last):
  File "/home/esbench/rally/esrally/rally.py", line 1184, in dispatch_sub_command
    race(cfg, args.kill_running_processes)
  File "/home/esbench/rally/esrally/rally.py", line 932, in race
    with_actor_system(racecontrol.run, cfg)
  File "/home/esbench/rally/esrally/rally.py", line 962, in with_actor_system
    runnable(cfg)
  File "/home/esbench/rally/esrally/racecontrol.py", line 408, in run
    raise e
  File "/home/esbench/rally/esrally/racecontrol.py", line 405, in run
    pipeline(cfg)
  File "/home/esbench/rally/esrally/racecontrol.py", line 74, in __call__
    self.target(cfg)
  File "/home/esbench/rally/esrally/racecontrol.py", line 344, in benchmark_only
    return race(cfg, external=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/esbench/rally/esrally/racecontrol.py", line 302, in race
    raise exceptions.RallyError(result.message, result.cause)
esrally.exceptions.RallyError: Traceback (most recent call last):
  File "/home/esbench/rally/esrally/metrics.py", line 106, in guarded
    return target(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/esbench/.local/lib/python3.11/site-packages/elasticsearch/helpers/actions.py", line 524, in bulk
    for ok, item in streaming_bulk(
  File "/home/esbench/.local/lib/python3.11/site-packages/elasticsearch/helpers/actions.py", line 438, in streaming_bulk
    for data, (ok, info) in zip(
  File "/home/esbench/.local/lib/python3.11/site-packages/elasticsearch/helpers/actions.py", line 355, in _process_bulk_chunk
    yield from gen
  File "/home/esbench/.local/lib/python3.11/site-packages/elasticsearch/helpers/actions.py", line 274, in _process_bulk_chunk_success
    raise BulkIndexError(f"{len(errors)} document(s) failed to index.", errors)
elasticsearch.helpers.BulkIndexError: 18 document(s) failed to index.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/esbench/rally/esrally/actor.py", line 92, in guard
    return f(self, msg, sender)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/esbench/rally/esrally/driver/driver.py", line 306, in receiveMsg_WakeupMessage
    self.driver.post_process_samples()
  File "/home/esbench/rally/esrally/driver/driver.py", line 1007, in post_process_samples
    self.sample_post_processor(raw_samples)
  File "/home/esbench/rally/esrally/driver/driver.py", line 1120, in __call__
    self.metrics_store.flush(refresh=False)
  File "/home/esbench/rally/esrally/metrics.py", line 930, in flush
    self._client.bulk_index(index=self._index, items=self._docs)
  File "/home/esbench/rally/esrally/metrics.py", line 81, in bulk_index
    self.guarded(elasticsearch.helpers.bulk, self._client, items, index=index, chunk_size=5000)
  File "/home/esbench/rally/esrally/metrics.py", line 170, in guarded
    raise exceptions.RallyError(msg)
esrally.exceptions.RallyError: Unretryable error encountered when sending metrics to remote metrics store: [document_parsing_exception]