oleksiyk/kafka

LeaderNotAvailable disguised as UnknownTopicOrPartition

Closed this issue · 1 comments

the problem

When running SimpleConsumer.subscribe in a LeaderNotAvailable scenario, a UnknownTopicOrPartition error is thrown:

KafkaError: This request is for a topic or partition that does not exist on this broker.

This can be reproduced (sometimes it works, sometimes it doesn't) running the code at https://github.com/Quadric/radiaction/tree/40d3433be9da803ab2c2207e51f4088bcb4ed069/examples/basic-example

It's important to have something done about it because such case is very hard to catch and debug. It took me days to find this error hidden inside SimpleConsumer.client.topicMetadata. Keep in mind that it is never guaranteed that the error will be there next time you run your code, given the nature of a LeaderNotAvailable issue. That's how my topicMetadata looks sometimes (some other times it's just empty):

{
  "rick-morty__BUY_SAUCE": {
    "0": {
      "error": {
        "name": "KafkaError",
        "code": "LeaderNotAvailable",
        "message": "This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writes."
      },
      "partitionId": 0,
      "leader": -1,
      "replicas": [],
      "isr": []
    }
  },
  ... // repeats for every topic

the solutions

  • a LeaderNotAvailable error should be thrown as result of a failing subscribe due to the lack of a leader
  • there needs to be a way to wait for a leader to be elected, and then be able to call subscribe again.

Topic:partition pairs that received LeaderNotAvailable error during subscribe will be retried to subscribe on each _fetch call. So thats exactly what you name as a second solution:

there needs to be a way to wait for a leader to be elected, and then be able to call subscribe again.