awslabs/amazon-kinesis-client

Allow number of LeaseRenewer threads to be configured

joelittlejohn opened this issue ยท 7 comments

I have an application that consumes multiple DynamoDB streams - eleven currently. I use a single worker on each stream but each worker creates a LeaseCoordinator with twenty LeaseRenewer threads. So I currently have 220 LeaseRenewer threads.

This seems excessive but there's currently no way to reduce the number of LeaseRenewer threads that I can see.

Would it be acceptable for me to create a PR that reduced the number of LeaseRenewer threads created by default, or simply add a configuration option to allow this number to be reduced?

I am having a similar issue and I am using version 1.7.5. I have about 5 consumers, which leads to 60 LeaseRenewer threads. Please refer to the screenshot from VisualVM. Why does it need so many? I would expect a couple re-used threads.

6b1f987d-9ef2-493b-8fd9-b64f64001907

I'd like to increase the number of streams consumed by my app to around 20, but this means I'll have 400 LeaseRenewer threads ๐Ÿ˜ฌ This can't be necessary or useful.

It's not necessary, but an artifact of the original expectations of the KCL. The original thought was each KCL application would process one stream which would end up with a small number of LeaseRenewer threads.

With the move to consuming multiple streams within a single application many of the thread pools should probably be shared. I think the easiest way to solve this would be to allow the application to either provide the thread pool to use, or provide a factory interface that allows the application to decide which thread pool should be used.

Here is the code where this issue occurs. Allowing the application to pass in a thread pool, or factory would require changes to the KinesisClientLibConfiguration, and the Worker.

@pfifer I think you've linked to the lease coordinator thread pool but the bigger problem is the lease renewal thread pool. The former has 2 threads, the latter has 20.

So based on your advice, I think if we make this method public:

private static ExecutorService getLeaseRenewalExecutorService(int maximumPoolSize) {

private static ExecutorService getLeaseRenewalExecutorService(int maximumPoolSize) {

so that people have a way of creating this kind of pool, then add an additional constructor that is similar to this one:

public LeaseCoordinator(ILeaseManager<T> leaseManager,
        String workerIdentifier,
        long leaseDurationMillis,
        long epsilonMillis,
        int maxLeasesForWorker,
        int maxLeasesToStealAtOneTime,
        IMetricsFactory metricsFactory) {

but also accepts a leaseRenewalThreadpool, then I guess we need to make further changes in KinesisClientLibLeaseCoordinator and other places to ensure this value can be read from the KinesisClientLibConfiguration.

Alternatively, instead of allowing the thread pool to be passed around, we could simply allow the number of threads to come from the KinesisClientLibConfiguration. This is simpler but I agree it's also less flexible.

As an even simpler first step, would you be interested in changing

static final int MAX_LEASE_RENEWAL_THREAD_COUNT = 20;

to

static final int MAX_LEASE_RENEWAL_THREAD_COUNT = 10;

?

This would drastically reduce the problem for me.

It seems my tracing of the code path was quite right. Looking at the code it's the result of PR #135. The previous code was actually the worst of both worlds in that it set the maximum, and apparently allowed quick timeouts. It seems the best behavior would be to return to a cached thread pool, but with a minimum that matches the workloads for most workers. This reduce the required threads for small number of leases, while still allowing large number of leases to work normally.

@pfifer So from your comment I take the following:

  1. We should move back to the ThreadPoolExecutor used before #135
  2. We should set a much higher keep alive time, e.g. 60 minutes instead of 60 seconds

I also think we should also consider the following: set core size to be e.g. 1 and max core size to maximumPoolSize, then omit exec.allowCoreThreadTimeOut(true);. Under normal operation the core thread should not be killed/cycled by the executor.

I'm looking to revert back to CachedThreadPool, but with some changes to improve the behavior.

  1. Allow setting maximumPoolSize.
  2. Make the core thread count a fraction of the maximumPoolSize, right now a 1/4, with a minimum of 2 threads.
  3. Do not allow core threads to timeout. There is always going to be some degree of lease renewal activity, setting the core threads to timeout isn't really worth it. In most case the lease renewal requests will be handled by the core threads, and no further threads will need to be created.