Azure/azure-event-hubs-spark

Task distribution uneven

HongyuLi-ms opened this issue · 0 comments

Thanks for filing an issue with us! Below are some guidelines when filing an issue. In general, the more detail the better!

Bug Report:

  • Actual behavior
    some executors could get many tasks which the number is more than cores

for example ,there are 8 cores per executor. 120 accepts 10 tasks, but 61 accepts 5 tasks. That will cause the job spent more time .
image
image

  • Expected behavior
    the number of executors accept tasks won't exceed their cores

  • test result of mine
    I found driver assign tasks evenly if I set preFetchCount=2. also I tested kafka-spark-connector and it's has no problem. distribution evenly.
    So I wonder maybe that was caused by the cache, preFetchCount causes the memory usage is not as we expected. so driver won't assign tasks anymore.

  • Spark version
    3.1.2

  • spark-eventhubs artifactId and version
    com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.21