Test against various common ES bad states

Question

Opened this issue 6 years ago · 0 comments

ES cluster are sensible to a number of issues that can happen during massive reindexations:

cluster crashing
no more memory stale
cannot attribute a new shard
cannot replicate a shard
not enough vm.max_map_count
split brain issues due to a network partition, generally leading to a loss of data
shards that did not have time to come back from queries (e.g. 4 shards coming back from a search query instead of 5)

We should find some ways to test those states, and check how our create/update/delete operations perform in this case