yaim

yaim - yet another ip manager yaim will check the healthiness of a component (e.g. via http GET, PostgreSQL query, shell). If the node is healthy, this information is published in a DCS (distributed consensus store) and all yaim registered in the same service directory will race to take on virtual IP addresses. All IP addresses specified in the DCS will always be connected to one of the nodes, so this concept can be easily used together with round robin DNS load balancing. If the health check fails or the node goes down, the remaining nodes will shortly afterwards try to take over the unused IP addresses.

yaim can be used as a replacement for keepalived or other virtual IP management solutions. It can provide high availability and redundancy on a network level and as such doesn't require any special client or server modifications.

design

A running instance of this program will be referred to as a node. In the DCS, the IP adresses must be added as keys in the service/ips/ path. Each IP address will be represented by a directory in DCS with the IP as the directory name.

The design of yaim is inspired by Patroni and vip-manager, so if you know either of those, yaim should seem familiar.

The program runs in a loop:

for
  sleep(interval)
  if node is healthy {
    create a key in the dcs that advertises this node as being healthy.
      -the key has a TTL for expiry
      -the key is named according to the OS hostname
    (or refresh TTL if the entry already exists in the DCS) 
    
    check for ip addresses in DCS
    
    look up which addresses are marked and unmarked
    
    when looking at the number of healthy nodes and the number of ip addresses,
    to achieve roughly equal distribution among all nodes,
     - do we need to register more IP-addresses on our node's interface?
        - if so, then "mark" the ip address in dcs,
          - with a node with key name "marked" in the `service/ips/[address]/` directory
          - and a TTL for expiry
     - or do we have to drop some addresses?
        - then remove the ip-address from the interface.
        - key expiry will remove the "mark", so the ip address can then be taken by another node
    
    refresh the TTL of all "marked" IP addresses that belong to this node
  }

configuration

All configuration takes place in the yaim.yml file. yaim by default looks for the yaim.yml file in the same directory as it's being launched from, however a custom config location can be provided using the config flag.

in yaim.yml

dcs-namespace

This is the namespace in the DCS in which all yaim clusters operate.

This should be different from the namespace used by other applications to avoid conflicts.

dcs-clustername

This is the directory in which this specific yaim cluster operates. This will be placed inside of the dcs-namespace directory.

interval

This is the main loop interval. After doing everything that is described in the design section, yaim will sleep for this many milliseconds.

ttl

The TTL that will be set for various keys. If the key expires, a failover would occur.

retry_num

Number of times yaim will try to get values from the etcd key-value store or try to ping the pgbouncer or postgresql database.

retry_after

Time to wait before trying to reach etcd or the database again.

dcs-type

The type of DCS used, currently only supports etcd

dcs-endpoints

A list of endpoints that can be used to access the same DCS cluster. The client will randomly try any of these endpoints.

etcd_user and etcd_password

Credentials to a user that may read and write within the dcs-namespace/dcs-clustername directory defined above.

checker-type: http

What kind of checker to use to evaluate healthiness. Currently supports http. Thinking of adding checkers for (PostgreSQL) databases and for running checks on the shell.

http-url

The URL to send the health check (GET request) to.

http-expected-code

What HTTP code implies healthiness? e.g. 200

http-expected-response-contains

e.g. '"value":"foo"'

usage

adding IP addresses to the pool

simply create a directory with the name of the directory containing the ip-address in the KV-store, for example with etcd:

curl -s http://192.168.0.34:2379/v2/keys/service/yaim/ips/123.0.0.1 -XPUT -d dir=true

yaim will then register that a new IP is available and it will try mark it.

deleting addresses from the pool

This is just as easy as adding addresses, simply remove the directory from etcd:

curl -s http://192.168.0.34:2379/v2/keys/service/yaim/ips/123.0.0.1?recursive=true -XDELETE

cybertec-postgresql/yaim