elastic/elasticsearch-java

Threads lock scenario at BulkIngester // FnCondition with high concurrency setup

codehustler opened this issue · 14 comments

Java API client version

7.17.12

Java version

11

Elasticsearch Version

7.17.12

Problem description

Hi, I think I have found a bug with the BulkIngester, maybe an issue with the locks.

The problem is, that only certain dev machines and some servers show this issue. We run the 7.17.12 java client lib. I cannot 100% figure out what is going on, and it probably makes no sense to create a ticket for this without being able to reproduce it properly. I have attached a thread dump which shows several threads still waiting, I hope this helps.

More context:

We use the bulk ingester to index a file with ~12k documents (just one example file). it runs to 99%, then gets stuck, and because we have configured a 10sec flush interval on the BulkIngester, every 10 seconds we see a bulk context getting flushed with just a single document in it. This goes on for 3 to 4 minutes and every 10 seconds the same picture: one bulk request with a single add operation. A thread dump shows that some threads are waiting in BulkIngester.add, which is waiting inside the FnCondition.whenReadyIf(...) at the "awaitUninterruptibly" call. So it seems, one bulk request comes back with a single request in it, that triggers the addCondition.signalIfReady() call which then lets the next request through, but again with just one single request in it. this does not happen when debugging, this does not happen when adding a per document log message, thats why I think it is a race condition somewhere. If I change the addCondition.signalIfReady() to signalAllIfReady, it works, but I would really like to find out the actual root cause of this!

I have a 32 core CPU, we are collecting and preparing our index documents in parallel. When I limit the pool to 8 threads, then it also works just fine.

thread_dump.txt