Make bucky route traffic between the nodes in a distributed manner

Question

Make bucky route traffic between the nodes in a distributed manner

grzkv opened this issue 6 years ago · 11 comments

The cluster nodes are routinely rebalanced or re-populated, e.g. after boxes are added, removed, replaced, or they die. During these operations, bucky moves the metrics around. Currently, all the traffic goes through a single master node when doing that. This makes such actions slow.

Bucky could be optimized to send data between the nodes when moving metrics around instead of bridging through a single master node.

E.g. let's say we have a cluster with nodes A, B, and C. We add node D. A serves as master. Now we would do data movement like this:
A -> D
B -> A -> D
C -> A -> D
and we could make it look like this:
A -> D
B -> D
C -> D

This will reduce network load and disk load on A making the operation faster.

Answer 1 · 2018-09-28T17:47:59.000Z

What "master" node means in your case? I usually run buckyd server and client on every node.

Answer 2 · 2018-10-01T09:20:56.000Z

The balancing is orchestrated by bucky client. This operation is coordinated by a single bucky client on one machine. Even if there is bucky client running on every machine, only one performs the orchestration. This is the master in my description.

Answer 3 · 2018-10-01T14:04:19.000Z

If you adding node D in the scenario above - then run bucky client in node D and then it will work exactly as you wishing, no?

Answer 4 · 2018-10-01T14:24:58.000Z

@deniszh Yes, you are right. This will cover the above scenario.

But it will not cover the scenarios that have several target nodes, e.g. restoring data. Because in that case the data is moved to several different nodes, and we can only have optimization for one on which we run the bucky client.

Answer 5 · 2018-10-01T15:47:22.000Z

restoring data. Because in that case the data is moved to several different nodes

Sorry, @grzkv, I just still not getting how it can be the case. Bucky not supporting RF>1, so, it's always moving metric from one node to another, right? And bucky client on destination node will do orchestration, so, it always will be moving data between 2 nodes, so, I'm not getting how
B -> A -> D
C -> A -> D
can be true.
I'm not against for optimizing bucky workload, just not really understand use case...

Answer 6 · 2018-10-01T16:04:16.000Z

Any scenario that has several different destination nodes during rebalancing will benefit. Also, sometimes when rebalancing we don't know in advance which nodes will be destinations, so we can't pre-select them to run bucky client.

The operation I am talking about is here https://github.com/go-graphite/buckytools/blob/master/cmd/bucky/rebalance.go

I guess, there are more scenarios when this optimization will help. Again, it's any scenario with several destination nodes, or when destination node is unknown in advance.

Answer 7 · 2018-10-01T16:10:34.000Z

Ah, finally got what you mean, just slightly confusing terminology. In my terminology "several destination nodes" is just not possible, but it will use bucky client as transport node when moving data.
OK, makes sense.

Answer 8 · 2018-10-01T19:56:03.000Z

Thanks for the discussion. I highly appreciate the feedback. Sorry for the unclear terminology.

Answer 9 · 2018-10-01T21:09:07.000Z

Nah, it was my bit rusty mind, especially at the end of a day.
But yes - more advanced routing is definitely an area for improvement. We can make client coordinator only and make server talk to another server directly.

Answer 10 · 2018-10-01T21:38:59.000Z

We can make client coordinator only and make server talk to another server directly.

This is exactly my thinking.

Answer 11 · 2023-02-23T13:32:15.000Z

Fixed in #26