PoolAcquirePendingLimitException, Pending acquire queue has reached its maximum size of 1000
euchungmsft opened this issue · 8 comments
Please provide us with the following information:
This issue is for a: (mark with an x
)
- [ X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
-
posting thousands of messages to the event hub
-
and run
mvn spring-boot:run
Any log messages given by the failure
2021-08-18 18:04:52.407 INFO 5196 --- [tition-pump-1-2] c.a.s.s.e.b.EventHubBinderApplication : New message received: '{user_id=uid001, last_updated=578925754902300, subtitle=KR, content_id=ID35685, progress=on, category=cat1}', partition key: null, sequence number: 24326, offset: 4298837352, enqueued time: 2021-08-18T07:45:54.149Z
2021-08-18 18:04:52.407 ERROR 5196 --- [ parallel-1] c.a.s.s.e.b.EventHubBinderApplication : Exception found
reactor.netty.internal.shaded.reactor.pool.PoolAcquirePendingLimitException: Pending acquire queue has reached its maximum size of 1000
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.drainLoop(SimpleDequePool.java:363) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.pendingOffer(SimpleDequePool.java:549) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.doAcquire(SimpleDequePool.java:266) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.request(AbstractPool.java:399) [reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onSubscribe(DefaultPooledConnectionProvider.java:213) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool$QueueBorrowerMono.subscribe(SimpleDequePool.java:674) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.resources.PooledConnectionProvider.lambda$acquire$1(PooledConnectionProvider.java:137) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.lambda$subscribe$0(HttpClientConnect.java:269) ~[reactor-netty-http-1.0.7.jar:1.0.7]
at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:76) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:272) ~[reactor-netty-http-1.0.7.jar:1.0.7]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:76) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:52) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:33) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionOtherSubscriber.onNext(FluxDelaySubscription.java:130) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) ~[reactor-core-3.4.6.jar:3.4.6]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_272]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_272]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_272]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]
Expected/desired behavior
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
Versions
as listed in the maven build file
Mention any other details that might be useful
same situation at the customer, i was trying to reproduce it with the example posted here, but same error
Thanks! We'll be in touch soon.
hi @euchungmsft,thanks for reaching us, we will look into it and reply to you soon.
Hi @euchungmsft , could you provide below information to help us investigate this issue?
- How do you post messages to the event hub?
- Hope you could provide your configuration of checkpointing,e.g., the application.yaml/application.properties for checkpointing and also the code. We would like to know if you use your customized way of checkpointing.
Also you could provide a sample project directly if you have any customization.
Hi @yiliuTo
-
Simply pushed it to EH, source code is over here,
https://github.com/euchungmsft/azure-examples/blob/main/az-eventhub-native/src/main/java/com/eg/az/eh/nat/Sender.java -
Almost no changes from the example, source code is over here,
https://github.com/euchungmsft/azure-examples/tree/main/azure-spring-cloud-sample-eventhubs-binder
You can reproduce the issue by simply running the example here
https://github.com/Azure-Samples/azure-spring-boot-samples/tree/tag_azure-spring-boot_3.6.0/eventhubs/azure-spring-cloud-sample-eventhubs-binder
@yiliuTo could you please share the status of this to me ? it's almost the time to move on to next step with this
Hi @euchungmsft , we found the cause is due to everty time a message is checkpointed, an HTTP connnection is established by the underlying eventhubs SDK library. As the number of HTTP connections increases, the limitation of your os's open file discriptors is reached, then pending connections increase and finally reach the the limitation of Reactor Netty's pending queue size. To solve this, there are multiple ways:
- you could choose to increase the number of partition and consumer application instances to decrease each consumer's consuming pressure.
- change the open file limitation, which is 1024 by default.
- you can choose to decrease the frequncy of checkponting, the eventhubs stream binder also provide other checkpoint modes like PARTITION_COUNT and TIME, which can checkpoint msgs by counts and time interval. You find more detailed instructions here.
Thank you so much for the feedback, customer's now ready to test this again. It's right on time
@yiliuTo partition count is already 32 at its max, 2 was already done when I was raising this issue, let me test with 3
I'll get back to you soon, thanks
Let's close this for now, I'll get back to you @yiliuTo once there's further request from them. Thans !!