Polymur with HA
therealsamlin opened this issue · 1 comments
Hi,
Would like to find out how to handle HA in polymur, I'm thinking of placing two polymur instances under a loadbalancer and have all my metric-originate instances to send metrics over to the loadbalancer DNS. Also how should we perform health checks in this case?
Another question - what would polymur do if one or more graphite instances are not available, or what would polymur do if all the graphite instances are not available?
Thanks in advance for your help!
Cheer,
Sam
Hey Sam,
If you haven't seen it yet, this blog post describes the use of a load balancer. Generally speaking, polymur and polymur gateway both work well with load balancers. For regular polymur, tcp (layer 4) mode is best, and http(s) (layer 7) is best for polymur-gateway. In terms of health checks, there isn't a standalone port for health checks, but I do have an open issue for it. Otherwise, the load balancer just checks that the instance is up. Personally, this has worked quite well for me.
In regards to what happens if polymur destinations become unavailable, polymur has a dedicated queue for each destination. The size is set using the -outgoing-queue-cap
flag (which defaults to quite low. If you have lots of memory, don't hesitate to set that to tens of thousands). If a destination becomes unavailable, metrics to it are queued up while a connection retry loop continues in the background. If the destinations becomes available, the queued up metrics will be sent. If you remove the destination from the API, the metrics will be sent to other healthy destinations. If you do neither and both the outbound and retry queue overflow, new data bound for that destination will be dropped.
Let me know if that clarifies things!