ReceiverDisconnectedException even if using different consumer groups
HaowenZhangBD opened this issue · 1 comments
Hi team, we have seen the ReceiverDisconnectedException
in our databricks env and done some research.
Found other people have similar problem and solved in these 2 docs
https://github.com/Azure/azure-event-hubs-spark/blob/master/FAQ.md
https://github.com/Azure/azure-event-hubs-spark/blob/master/examples/multiple-readers-example.md
We have read through them and follow the suggestions of using different consumer groups for different stream.
But we still get ReceiverDisconnectedException
on both of the stream in the Similar timestamp
Bug Report:
- Actual behavior
stream 1 using PATH: publisher-events-eh/ConsumerGroups/job1/Partitions/0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5065.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5065.0 (TID 91438) (10.139.64.4 executor driver): java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'spark-driver-87' with higher epoch of '0' is created hence current receiver 'spark-driver-87' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:581a6d040004c849000eef7c64ddd416_G27_B39, SystemTracker:OUR EVENTHUB:publisher-events-eh~1023|job1, Timestamp:2023-08-17T08:02:35, errorContext[NS: OUR EVENTHUB, PATH: publisher-events-eh/ConsumerGroups/job1/Partitions/0, REFERENCE_ID: LN_a37906_1692259345344_1af_G27, PREFETCH_COUNT: 500, LINK_CREDIT: 1000, PREFETCH_Q_LEN: 0]
stream 2 using PATH: publisher-events-eh/ConsumerGroups/machine2/Partitions/0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5069.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5069.0 (TID 91503) (10.139.64.4 executor driver): java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'spark-driver-315' with higher epoch of '0' is created hence current receiver 'spark-driver-315' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:581a6d040006c849000eef5c64ddd416_G2_B39, SystemTracker:OUR EVENTHUB:publisher-events-eh~1023|machine2, Timestamp:2023-08-17T08:02:35, errorContext[NS: OUR EVENTHUB, PATH: publisher-events-eh/ConsumerGroups/machine2/Partitions/0, REFERENCE_ID: LN_190e6e_1692259345190_e97a_G2, PREFETCH_COUNT: 500, LINK_CREDIT: 1000, PREFETCH_Q_LEN: 0]
Maybe worth mention: Another Environment, applying the same code change, didn't have ReceiverDisconnectedException
after running for around 1 day