Azure-Samples/azure-spring-boot-samples

PoolAcquirePendingLimitException, Pending acquire queue has reached its maximum size of 1000

euchungmsft opened this issue · 8 comments

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. clone this
    https://github.com/Azure-Samples/azure-spring-boot-samples/tree/tag_azure-spring-boot_3.6.0/eventhubs/azure-spring-cloud-sample-eventhubs-binder

  2. posting thousands of messages to the event hub

  3. and run mvn spring-boot:run

Any log messages given by the failure

2021-08-18 18:04:52.407 INFO 5196 --- [tition-pump-1-2] c.a.s.s.e.b.EventHubBinderApplication : New message received: '{user_id=uid001, last_updated=578925754902300, subtitle=KR, content_id=ID35685, progress=on, category=cat1}', partition key: null, sequence number: 24326, offset: 4298837352, enqueued time: 2021-08-18T07:45:54.149Z
2021-08-18 18:04:52.407 ERROR 5196 --- [ parallel-1] c.a.s.s.e.b.EventHubBinderApplication : Exception found

reactor.netty.internal.shaded.reactor.pool.PoolAcquirePendingLimitException: Pending acquire queue has reached its maximum size of 1000
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.drainLoop(SimpleDequePool.java:363) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.pendingOffer(SimpleDequePool.java:549) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.doAcquire(SimpleDequePool.java:266) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.request(AbstractPool.java:399) [reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onSubscribe(DefaultPooledConnectionProvider.java:213) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool$QueueBorrowerMono.subscribe(SimpleDequePool.java:674) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.netty.resources.PooledConnectionProvider.lambda$acquire$1(PooledConnectionProvider.java:137) ~[reactor-netty-core-1.0.7.jar:1.0.7]
at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.lambda$subscribe$0(HttpClientConnect.java:269) ~[reactor-netty-http-1.0.7.jar:1.0.7]
at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:76) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:272) ~[reactor-netty-http-1.0.7.jar:1.0.7]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.FluxRetryWhen.subscribe(FluxRetryWhen.java:76) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoRetryWhen.subscribeOrReturn(MonoRetryWhen.java:46) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:52) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:33) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionOtherSubscriber.onNext(FluxDelaySubscription.java:130) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) ~[reactor-core-3.4.6.jar:3.4.6]
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) ~[reactor-core-3.4.6.jar:3.4.6]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_272]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_272]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_272]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_272]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

as listed in the maven build file

Mention any other details that might be useful

same situation at the customer, i was trying to reproduce it with the example posted here, but same error


Thanks! We'll be in touch soon.

hi @euchungmsft,thanks for reaching us, we will look into it and reply to you soon.

Hi @euchungmsft , could you provide below information to help us investigate this issue?

  1. How do you post messages to the event hub?
  2. Hope you could provide your configuration of checkpointing,e.g., the application.yaml/application.properties for checkpointing and also the code. We would like to know if you use your customized way of checkpointing.

Also you could provide a sample project directly if you have any customization.

image

Ubuntu 16.04 LTS

@yiliuTo could you please share the status of this to me ? it's almost the time to move on to next step with this

Hi @euchungmsft , we found the cause is due to everty time a message is checkpointed, an HTTP connnection is established by the underlying eventhubs SDK library. As the number of HTTP connections increases, the limitation of your os's open file discriptors is reached, then pending connections increase and finally reach the the limitation of Reactor Netty's pending queue size. To solve this, there are multiple ways:

  1. you could choose to increase the number of partition and consumer application instances to decrease each consumer's consuming pressure.
  2. change the open file limitation, which is 1024 by default.
  3. you can choose to decrease the frequncy of checkponting, the eventhubs stream binder also provide other checkpoint modes like PARTITION_COUNT and TIME, which can checkpoint msgs by counts and time interval. You find more detailed instructions here.

Thank you so much for the feedback, customer's now ready to test this again. It's right on time

@yiliuTo partition count is already 32 at its max, 2 was already done when I was raising this issue, let me test with 3

I'll get back to you soon, thanks

Let's close this for now, I'll get back to you @yiliuTo once there's further request from them. Thans !!