Introduce a timeout for destroying processors

Question

ocadaruma opened this issue 3 years ago · 0 comments

Summary

Currently, rebalance-listener could block indefinitely in below scenario:
- Let's say:
  - There are 3 partitions (0,1,2)
  - 3 instances (X,Y,Z)
  - partition-0 is assigned to X
  - max.poll.interval.ms is set to long value because sometimes task takes long time to complete (Basically Decaton doesn't block poll() by task processing, but there's an exception in rebalance-listener)
- Then, let's say partition-0's assignment is taken over by Y by reassignment triggered by Z's restart
- rebalance-listener's onPartitionAssigned calls partition-0's PartitionProcessor#close, which waits processor unit's executor to terminate indefinitely
- If long-running task is being executed by a processor, onPartitionAssigned will be blocked until the task completes
- When Z came back to the group, it initiates another rebalance, but it never finishes because X is in stuck at onPartitionAssigned, so cannot send JoinGroup
- Meanwhile, all processing on all instances are stuck
Though this behavior is Decaton's design decision (i.e. "rebalance could block up to max.poll.interval.ms at max"), some users use Decaton for processing long-running tasks, and aborting task "ungracefully" is preferable to waiting task indefinitely
So it might be better to have an option to set timeout on destroying processors