jamiealquiza/polymur

Add REPLICATION_FACTOR support

erez-rabih opened this issue · 10 comments

Hi,

I saw there are two modes: consistent-hashing and broadcast. How can I set the replication factor of a metric so that a single metric arrives consistently to two graphite backends?

Replication factor doesn't exist, but I could (and probably should) add it. Let me review this.

Thanks for the fast reply.
Also, I have thought about using polymur in production. I wanted to ask you how stable is it from your experience?

It's routed almost the entire production metrics traffic at FireEye for over a year, and I've also heard from some pretty well known companies that have began using it (although I didn't gather at what scale). From a stability standpoint, it's production-worthy and doesn't have any known/open stability related bugs. Mostly just features.

Nice.
I would definitely switch my carbon-relays to polymur once replication factor is implemented.
Looks like a great project.

Renamed and will use this for issue tracking. Notes for development:

With replication, a get_nodes is called repeatedly during key lookup until a set of REPLICATION_FACTOR length (server, instance) tuples is gathered. These are the routing targets.

Initial idea would be to specify replication factor in the destination string, e.g. polymur -destinations="10.0.5.20:2003 for a REPLICATION_FACTOR equivalent of 2 * as a -replication-factor config. Unspecified should default to 1 to be backwards compatible with existing configuration.

*Replication factor has to be applied to the whole pool, so per-destination settings don't make sense.

I think replication factor should be an independent flag as it has no relation to a specific destination.
Also, I see no use case for different replication factors on different destinations so there's not reason to attach a RF (replication factor) to a specific host:ip

Yeah, I just realized what I was doing and updated :)

Also, RF should only be taken into account when consistent hashing is used since broadcast implicitly means RF = # Destinations

Or if we really want to be smart about this - broadcast is just a specific case in which RF = #Destinations but I don't know the project well enough to decide if that's how you would like to implement this.

It should just be ignored in broadcast, since that's basically what broadcast is (send a copy of all metrics to all destinations in the list). Will probably just add a startup note that lets users know if broadcast is being used and a replication-factor is set, it's being ignored / has no effect.