ISISComputingGroup/IBEX

Instrument PCs: Nagios Check on log growth/size

Closed this issue · 2 comments

As a developer, I would like there to be a direct Nagios check on log size/growth, rather than relying on the current checks on disk size as a proxy.

Acceptance Criteria

  • A suitable criteria to check (e.g. log directory > x size, or log directory has grown by more than x size in 12 hours) is identified
  • A Nagios check is implemented to monitor this criteria

Extra Information

Large logs imply an unhappy state and we currently notice this most often as a result of the Nagios check on disk usage, as disk usage is a proxy for log growth when the logs are only moved manually. #8360 will move to an automated process moving logs, this will mean the disk usage check will be a less sensitive proxy for log growth. To that end we need a direct check either on log size or rate of growth.

How to Test

verbose instructions for reviewer to test changes
(Add before making a PR)

Task made to set appropriate warning and critical levels but check is now up and running on Nagios for all EPICS insts.
No PRs as nagios looked after just by a local git repo on its server.

IOC Log File Rate appears on the nagios dashboard for instruments