Improve remote metrics store flushing
b-deam opened this issue · 2 comments
Similar to #1723, benchmarks with many concurrent/parallel tasks that generate a lot of samples can end up producing too many samples for our single-threaded metrics store flush method to handle, causing excessive memory usage on the load driver and delaying metrics delivery to the remote store.
The Driver
already wakes up every POST_PROCESS_INTERVAL_SECONDS
(30) to flush the collected samples to the metrics store:
rally/esrally/driver/driver.py
Lines 295 to 305 in 2470328
rally/esrally/driver/driver.py
Lines 950 to 955 in 2470328
rally/esrally/driver/driver.py
Lines 1062 to 1068 in 2470328
Once self._client.bulk_index()
completes we clear the in-memory buffer:
Lines 924 to 942 in 2470328
We use the elasticsearch.helper.bulk()
method to iterate over all the documents in the in-memory buffer, sending them in chunks of 5000 docs at a time:
Lines 89 to 93 in 2470328
This approach works fine for most benchmarks, but for challenges like logging-indexing-querying
of the elastic/logs
track we generate a lot of documents, meaning that this in-memory buffer is often full of tens to hundreds of thousands of documents that are indexed by a single client controlled by the Driver
actor. This problem is exacerbated in environments where the load driver is a long way away from the metrics store (i.e. cross regional), or if the metrics store itself is overloaded, because our single client throughput is bound by the latency of each request.
There’s a parallel_bulk
helper that uses multiprocessing.pool import ThreadPool
, but the Thespian docs specifically call out that:
Actors are not intended to create other processes or threads; attempts to do so may create undefined behavior
We could consider a few things here, but all will require extensive testing:
- Decrease the
POST_PROCESS_INTERVAL_SECONDS
from 30 to allow more frequent flushes, intending to prevent the buffer from growing too large - Increase the
chunk_size
used by the helper method to send more docs at once (i.e. 10,000), but I don't think this will make a large difference because we're still using a single client - Add some logic to detect the size/num of docs in the in-memory buffer and flush once it grows beyond a certain bound, we'd probably add this here by checking the
len(self.coordinator.raw_samples)
(or similar) in addition to the timer check:rally/esrally/driver/driver.py
Line 301 in 2470328
- Make use of the
AsyncElasticsearch
client instead of the sync client for concurrently flushing metrics, allowing us to use coroutines (and not threads or processes)
Here's an example of a benchmark where the flushing gets further and further behind:
logs
[...]
2023-06-01 05:31:44,977 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 7698 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [5.645342] seconds.
2023-06-01 05:32:20,291 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 6608 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [5.136896] seconds.
2023-06-01 05:32:54,262 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 4701 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [3.870889] seconds.
2023-06-01 05:33:40,384 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 24861 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [15.725329] seconds.
2023-06-01 05:34:17,364 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 9695 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [6.802203] seconds.
2023-06-01 05:34:52,410 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 5970 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [4.929296] seconds.
2023-06-01 05:35:27,891 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 7039 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [5.356641] seconds.
2023-06-01 05:36:17,925 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 31359 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [19.628063] seconds.
2023-06-01 05:36:54,653 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 9643 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [6.537706] seconds.
2023-06-01 05:37:29,353 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 5556 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [4.587056] seconds.
2023-06-01 05:38:03,520 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 4945 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [4.026544] seconds.
2023-06-01 05:39:10,213 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 58085 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [35.715736] seconds.
2023-06-01 05:39:59,862 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 30212 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [19.092454] seconds.
2023-06-01 05:40:35,964 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 7954 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [5.933280] seconds.
2023-06-01 05:41:11,689 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 7043 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [5.595361] seconds.
2023-06-01 05:41:45,371 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 4211 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [3.589514] seconds.
2023-06-01 05:43:22,645 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 109047 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [65.609153] seconds.
2023-06-01 05:44:04,594 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 16993 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [11.586287] seconds.
2023-06-01 05:45:40,952 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 107550 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [64.712352] seconds.
2023-06-01 05:46:35,25 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 37899 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [23.376981] seconds.
2023-06-01 05:47:35,444 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 48301 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [29.687247] seconds.
2023-06-01 05:49:21,694 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 122151 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [74.320019] seconds.
2023-06-01 05:51:42,449 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 177342 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [107.253065] seconds.
2023-06-01 05:52:31,254 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 28988 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [18.212773] seconds.
2023-06-01 05:53:43,532 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 68180 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [41.309341] seconds.
2023-06-01 05:55:43,475 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 143725 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [87.643339] seconds.
2023-06-01 05:59:22,203 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 304369 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [183.337449] seconds.
2023-06-01 06:01:50,601 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 189635 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [115.103130] seconds.
2023-06-01 06:05:22,154 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 291041 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [175.944680] seconds.
2023-06-01 06:09:20,74 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 335387 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [202.510297] seconds.
2023-06-01 06:13:34,667 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 361446 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [217.943404] seconds.
2023-06-01 06:18:14,785 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 401978 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [242.862935] seconds.
2023-06-01 06:25:56,174 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 694914 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [418.961541] seconds.
2023-06-01 06:31:11,312 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 442631 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [276.282012] seconds.
2023-06-01 06:42:10,776 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 963791 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [611.130494] seconds.
2023-06-01 06:54:05,328 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1045172 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [665.431251] seconds.
2023-06-01 07:06:53,874 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1114536 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [717.690627] seconds.
2023-06-01 07:19:33,335 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1104502 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [708.316650] seconds.
2023-06-01 07:38:39,865 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1694037 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [1081.415942] seconds.
2023-06-01 07:57:48,677 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1703366 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [1084.819145] seconds.
2023-06-01 08:05:49,595 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 683827 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [437.580140] seconds.
2023-06-01 08:22:58,518 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1526505 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [971.327037] seconds.
2023-06-01 08:36:57,624 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 1231688 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [784.602905] seconds.
2023-06-01 09:00:05,333 ActorAddr-(T|:45005)/PID:19711 esrally.metrics INFO Successfully added 2050758 metrics documents for race timestamp=[20230601T052826Z], track=[elastic/logs], challenge=[logging-indexing-querying], car=[['external']] in [1316.636413] seconds.