gojek/ziggurat

Ziggurat drops messages while pushing to rabbitmq when connection with rabbitmq is broken

theanirudhvyas opened this issue · 1 comments

When the connection with rabbitmq is broken and the service attempts to push messages to it, it retries the publish a couple of times and then drops the message and moves on to the next message.

The expected behaviour is that it should raise an exception and the message offset should not be commited to kafka.

@mjayprateek and I are picking this up.

The problem:

When the connection with RabbitMQ is broken while the service is running, Ziggurat does not exit, it keeps on processing messages. Publish to RabbitMQ ziggurat.producer/publish retries publishing 5 times, but if it is still failing, it just reports the issue to sentry and returns.

Since it returns without an exception, streams commits the message and moves on to the next message, thus causing the message loss.

Proposed Solution:

In ziggurat.producer/publish, if the publishing fails even after the retry, we'll stop the streams, so that no new messages are commited or read from kafka.
The streams can be restarted manually by restarting the service (or we could provide an API for restarting the streams).