last-restore-age check only returns 0
asharpe opened this issue · 6 comments
A cursory look into the check implementation (https://github.com/omniti-labs/omnipitr/blob/master/lib/OmniPITR/Program/Monitor/Check/Last_Restore_Age.pm) makes me think that omnipitr-restore should be writing state somewhere, however it doesn't accept the --state-dir argument (another issue, because the docs say it does).
I'm using streaming replication, and omnipitr-restore doesn't appear to be writing to the log either, though I'm confident it's running since the postgres log contains the usage message if I try to use --state-dir.
Perhaps I'm just doing something wrong?
My restore command
restore_command = '/opt/omnipitr/bin/omnipitr-restore --config-file /etc/omnipitr/restore.cfg %f %p'
/etc/omnipitr/restore.cfg
--log /var/log/omnipitr/omnipitr-^Y^m^d.log
--source /opt/rh/postgresql92/root/var/lib/pgsql/wal_archive
--streaming-replication
My check command
sudo -u postgres /opt/omnipitr/bin/omnipitr-monitor -l /var/log/omnipitr/omnipitr-^Y^m^d.log --state-dir /var/run/omnipitr -c last-restore-age
If you are using streaming replication, then omnipitr-restore is not called at all, unless immediately after restart of postgresql. So running check for restore will not work.
If you're looking for some other methods of monitoring slave lag with streaming replication, please see http://www.keithf4.com/monitoring_streaming_slave_lag/
"then omnipitr-restore is not called at all, unless immediately after restart of postgresql" - I thought it may also be called if SR falls back farther than available WAL on the primary (ie, the configuration is both SR and WAL shipping).
I was hoping to use the last-restore-age check as a general nagios check to determine the status of the replica.
Tks for the software :)
It would get called in that case. But if it never falls back to WAL replay (which is the common case, and I hope it is so for you), then it's never called. I guess you could create some sort of check that would watch for that, but I think you're better off creating the checks in the blog post I linked than in trying to use last-restore-age in a manner it's not really made for.