keepalived is based on the VRRP protocol which is used for many redundant and high availability scenarios.
https://www.keepalived.org/
https://github.com/acassen/keepalived
This guide will setup a 2 node cluster with a floating IP (also known as a Virtual IP or VIP) and will watchdog a service via REST and perform an action up on failover via REST as well.
You can setup a 3 node cluster with 1 MASTER and 2 BACKUPs following the same guide. Just add another BACKUP and update the
unicast_peers
. The same goes for a 4+ node cluster.
This guide assumes there is a web server at http://localhost:80 that will be health checked via HTTP GET. Run something like nginx to test this.
keepalived works on many linux versions. In this guide we will be installing on Rocky Linux 9.
- keepalived man page
- DigitalOcean keepalived guide
- Kubernetes HA with keepalived
- tobru check and notify scripts
- RedHat keepalived docs
Install keepalived on all nodes.
yum install keepalived
The scripts for this setup use curl
to check a REST API. If you don't have curl
install it.
yum install curl
There's a keepalived syntax file for vim that might be helpful in development environments. https://github.com/glidenote/keepalived-syntax.vim
- Drop the
keepalived.vim
file into
/usr/share/vim/vim82/syntax/keepalived.vim
- Add the following line to ~/.vimrc:
au BufRead,BufNewFile keepalived.conf setlocal ft=keepalived
- It might also be useful to set your tabs to 4 spaces by adding
set tabstop=4
to
~/.vimrc
Move the default keepalived.conf
mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.default
Add a "keepalived_user" which will execute health checks on monitored services (via vrrp_script). See config for why this is done.
groupadd -r keepalived_script
useradd -r -s /sbin/nologin -g keepalived_script -M keepalived_script
Determine what your servers IPs are and what the floating IPs are and which one is master/primary. For this setup:
- Server A: 192.168.1.46 (master/primary)
- Server B: 192.168.1.47 (backup)
- Floating IP: 192.168.1.50
For this test both servers are using eth0
.
In production you may have a dedicated network for HA with it's own interfaces. This is recommended but not required.
Add the following files to /etc/keepalived/
for all servers.
Make the two scripts executable by running:
chmod +x /etc/keepalived/check_api.sh
chmod +x /etc/keepalived/handle_state_change.sh
chown keepalived_script:keepalived_script /etc/keepalived/check_api.sh
chown keepalived_script:keepalived_script /etc/keepalived/handle_state_change.sh
(Optional) If you want handle_state_change.sh to write to it's log, make that file and allow keepalived_script to write to it. Otherwise, comment out that capability in the script.
touch /var/log/keepalived_state.log
chown keepalived_script:keepalived_script /var/log/keepalived_state.log
Test that the configs are working with:
keepalived --config-test
Keepalived logs to syslog/messages. Tail it before starting.
tail -f /var/log/messages
Enable and Start keepalived by running:
systemctl enable keepalived
systemctl start keepalived
You can test a failover by stopping the webserver on the master to force a failover. You'll see the floating IP swap over and if your webservers are hosting unique pages (e.g. with the server name) you'll see that change too.
If you start the keepalived service on the master again, the floating IP will move back to master after a few seconds. You can disable return to master with nopreempt
in the configs.
Take a look at your interface to see when the floating IP is present, indicating master.
ip a
Systemctl can also tell you the status of keepalived.
systemctl status keepalived
keepalived can also be triggered to write status and statistics to files by sending signals to the keepalived binary.
If you send a USR1
signal to keepalived it will write status data to /tmp/keepalived.data. The safest way to do this is to ask keepalived for the correct signal number, as some kernels change the number.
To get the statistics data run the following:
kill -s $(keepalived --signum=DATA) $(cat /run/keepalived.pid); cat /tmp/keepalived.data
This returns a ton of data, which you can filter/grep across. To get the current state of a server/node run:
kill -s $(keepalived --signum=DATA) $(cat /run/keepalived.pid); cat /tmp/keepalived.data | grep State
Which returns:
State = BACKUP
Sending USR2
signal to keepalived writes statistics to /tmp/keepalived.stats. Again ask keepalived for the signal number.
kill -s $(keepalived --signum=STATS) $(cat /run/keepalived.pid); cat /tmp/k
eepalived.stats
Which returns:
VRRP Instance: VI_1
Advertisements:
Received: 3
Sent: 41291
Became master: 2
Released master: 1
Packet Errors:
Length: 0
TTL: 0
Invalid Type: 0
Advertisement Interval: 0
Address List: 0
Authentication Errors:
Invalid Type: 0
Type Mismatch: 0
Failure: 0
Priority Zero:
Received: 0
Sent: 1
You can reset these stats with:
kill -s $(keepalived --signum=STATS_CLEAR) $(cat /run/keepalived.pid)
Keepalived supports monitoring via SNMP GETs or SNMP Traps as well as notifications to be sent over email/SMTP. You can also hook into the notify
script to do advanced notifications via bash.
Take a look at the global_defs
section of the keepalived man page for more details on how to set this up.
https://www.keepalived.org/manpage.html