AutoMQ/automq

[Enhancement] Support slow broker detect on AutoBalancer

SCNieh opened this issue · 0 comments

What's the problem

When a broker experiences internal issues and has increased latency in producing or fetching data, the network bandwidth is likely to decrease. To ensure load balance within the cluster, the AutoBalancer will attempt to assign additional partitions to this broker, which can result in more partitions being affected by the failure.

How to identify slow brokers

Brokers will need to reporter additional metrics including append latency, append stream queue size, fast read latency and fetch queue size. And AutoBalancer will mark a broker as "slow" If any of these metrics show a sudden increase compared to the historical statistics

What to do with slow brokers

When a broker is marked as "slow", there will be no more additional partitions assigned to this broker. However, moving out existing partitions form it is still allowed.