confluentinc/parallel-consumer

Monitor for progress and optionally shutdown (leave consumer group), skip message or send to DLQ

astubbs opened this issue · 2 comments

Some issues could cause a lack of any progress along a partition. For example infinite retries on a message to an external system that will always be rejected. Or a possible bug in the handling code causing the message to not be retried.
If a situation like this is found, there should be options to either DLQ the message and move on, or potentially shutdown and leave the consumer group.
User code should be able to return either a fatal error type or a retriable error type.

The system at the moment does do mutual thread supervision for the two long running threads (control and poller) for thread death, but this issue is about adding an overall offset level fail safe, which will include monitoring for stuck messages that can't ever succeed. It is yet to be decided what options there will be in this case, for example: shutdown, dead letter queue, skip. Skip and shutdown would be easy to implement so they'll be first.

Note that to skip a message, a user can already simply return from their processing function without throwing an exception, and potentially without processing it.

Related: #71 Health-checks

Closing Issue