AutoMQ/automq

[Enhancement] Support historical metrics measurements in AutoBalancer

SCNieh opened this issue · 0 comments

Who is this for and what problem do they have today?

The current implementation uses the average rate of bytes in/out in the last minute as the metrics to describe the read/write speed of a partition. And controller uses the latest reported value as the evidence to optimize bytes in/out balance goal. This simple procedure works well so far, but lacks the ability to measure the long term trend of the metrics, which could be helpful for the implementation of goals that need to take historical data into scheduling consideration.

Proposed Solution

Involved Data Structures

Window
Used to represents a series of value in a certain time window, the capacity of a window is fixed and no more elements can be added to a window once it's full.
AbstractTimeWindowSamples
Contains a limited-sized dequeue of Window, when appending a value into this data structure, it will first try to append to the latest window, if the window is full, roll a new window and append the window into the dequeue in a FIFO manner. This data structure provides an abstract method to measure the validation of the data (or whether the data can be trusted) in the latest window, which can be implemented as per need.

Optimized Procedure

On taking the snapshot of the cluster model, the load of each resource will be derived from the timed-window samples of different raw metrics, along with a flag to describe whether the value of the load can be trusted. And a broker (or partition) with untrusted resource load will be excluded from the optimizing of the corresponding goals.