BedrockStreaming/Statsd

[Question] What is multi-server purpose?

Closed this issue · 5 comments

In Client::getServerKey() the server to send the stat to is "computed" using the CRC32 of the $stat string.
I don't understand why data are distributed between servers? Should not it be send to every server?

In what goal the "multi-server" feature was implemented?

Sorry for the trouble of asking but I cannot find the reason.

From How we use StatsD - TechM6Web:

StatsD is open sourced by etsy. In our configuration, we use several StatsD deamons and aggregate metrics on Graphite - one point per minute. Many servers allows us to scale, because we don’t sample the data at all.

On client side, we use a simple consistent hashing algorithm to dispatch metrics overs StatsD nodes on the same server.

I think I have my answer... metrics are distributed for scaling purposes. I thought they were sent to multiple servers for multiple "recipients".

You're right, but maybe, we need to add more information on that in the readme ? cc @omansour

I was wondering, would you accept a PR that adds a flag to make the client work in "multicast mode" (as opposed to "dispatching mode")?

the point is to implement consistent hashing. for exemple, metric A and B are always send to server1, metrics C always end to server2 and so on. If we loose one server we loose metrics. But if we add servers we shard even more the datas. Clearly this can be better explained in the README.

Behind each statsd server datas are aggregated in graphite. I dont see the point on a multicast mode, the system will not scale at all after that.

OK, I get it.
You have multiple server to scale while always keeping a given metric on the same server (consistency)
I have multiple server to report to different type of users.