Introduce a timeout for destroying processors
ocadaruma opened this issue · 0 comments
ocadaruma commented
Summary
- Currently, rebalance-listener could block indefinitely in below scenario:
- Let's say:
- There are 3 partitions (0,1,2)
- 3 instances (X,Y,Z)
- partition-0 is assigned to X
max.poll.interval.ms
is set to long value because sometimes task takes long time to complete (Basically Decaton doesn't blockpoll()
by task processing, but there's an exception in rebalance-listener)
- Then, let's say partition-0's assignment is taken over by Y by reassignment triggered by Z's restart
- rebalance-listener's
onPartitionAssigned
calls partition-0'sPartitionProcessor#close
, which waits processor unit's executor to terminate indefinitely - If long-running task is being executed by a processor,
onPartitionAssigned
will be blocked until the task completes - When Z came back to the group, it initiates another rebalance, but it never finishes because X is in stuck at
onPartitionAssigned
, so cannot sendJoinGroup
- Meanwhile, all processing on all instances are stuck
- Let's say:
- Though this behavior is Decaton's design decision (i.e. "rebalance could block up to
max.poll.interval.ms
at max"), some users use Decaton for processing long-running tasks, and aborting task "ungracefully" is preferable to waiting task indefinitely - So it might be better to have an option to set timeout on destroying processors