================================================================ NAGIOS AGGREGATE NOTIFICATION SYSTEM: README ================================================================ (c) 2001 - Nicholas Tang (ntang at nachtwache dot org) Updated - Bob Patterson (bpatterson at i1ops dot net) Under the Gnu Public License v2.0 http://www.gnu.org/copyleft/gpl.txt ---------------------------------------------------------------- INTRODUCTION The Nagios Aggregate Notification System (NANS) is designed to replace the existing 1-line shell script that currently handles notifications in Nagios (http://www.nagios.org/). It is a drop-in replacement that doesn't require any changes to the existing Nagios configuration other than telling it to use NANS instead of the current system. It is configurable on per-contact basis and allows for different levels of aggregation for epager vs. email notifications. It was designed to cut back on the flood of notifications that can occur at a larger Nagios installation; I have personally had as many as 13 pages get sent to me in a one minute period when we had a problem on the network, and unfortunately not every service or host can easily be assigned a parent - and even if it has one, until Nagios notices the parent is unavailable, it will still report a problem with the children. The latest version should always be available from my homepage at http://www.nachtwache.org/projects/ (under nagios/ utilities/ nans/ most likely). ---------------------------------------------------------------- INSTALLATION (Note: if upgrading from an old version, skip down to the next section, cunningly labelled "UPGRADING".) Installation is a simple matter. First off, put the two scripts (aggregate-notify.pl and collate-notifications.pl) on your system - I use Nagios's bin directory since it's mostly empty. They can go anywhere, as long as the Nagios user has read/execute privs for them. The next step is to create the configuration file(s). You can have one per instance, or one shared one, or whatever. I use one shared one, but some places might find it convenient to have different settings for different situations. SAMPLE CONFIG FILE # This is a comment foo@bar.com,1,0,1,0 billybob@bar.com,1,1,0,1 The example above will give user foo the count and summary reports, and give the user billybob the count, mini-summary and detailed reports. The config file is simple. Lines with a # at the beginning are comments (note, the "#" must be the FIRST character on the line). All other lines are considered config lines. I will likely make it a little more lenient in the next version, but for now, deal. ;) Each line should have 5 comma-delimited fields (no spaces). The first field is the contact email address, and fields 2-5 are simple on/off flags (0=off, 1=on) that determine which types of reports should be sent for that contact: a simple count, a mini summary, a more detailed summary, or the full details. Note that the summary and full details are mostly identical in terms of the information they provide, the summary is simply more compact. The mini summary is much more compact and strips out more information. Next, you need to add entries in your crontab for the aggregate mailer. There are four settings for it: -t: the frequency, in "time units", to send the emails -f: the full path to the notification log (generated by the other tool) -c: the full path to the config file (discussed above) -g: turns on grouping by host I say "time units" rather than minutes, because it actually stands for the number of times it should run before it will actually send out the notifications. If you run it every minute from cron, it will be in minutes. Run it every 5 minutes, and a single unit will be 5 minutes. Here's an example: * * * * * /usr/local/nagios/bin/aggregate-notify.pl -t 2 -c '/usr/local/nagios/etc/notification-page.cfg' -f '/usr/local/nagios/var/rw/page-log.txt' * * * * * /usr/local/nagios/bin/aggregate-notify.pl -t 10 -c '/usr/local/nagios/etc/notification-email.cfg' -f '/usr/local/nagios/var/rw/email-log.txt' If it's set up correctly, it should create a single-line file where you specified -f, and that line should be a comment with the current "ttm" value (time to mail ;) ) and the timer value you set with -t. Finally, you need to set up the notification collator. To do this, you need to configure Nagios to use it instead of the existing script. Replace the existing notify-by-email, notify-by-epager, host-notify-by-email, and host-notify-by-epager lines with the following, modifying the paths to match your install, of course: define command{ command_name notify-host-by-email command_line /usr/local/nagios/collate-notifications.pl -t 'HOST' -e '$CONTACTEMAIL$' -d '$LONGDATETIME$' -n '$NOTIFICATIONTYPE$' -I '$HOSTADDRESS$' -H '$HOSTNAME$' -h '$HOSTSTATE$' -o '$HOSTOUTPUT' -A '$HOSTNAME$' -a '$HOSTNAME$' -f '/usr/local/nagios/var/rw/email-log.txt' } define command{ command_name notify-service-by-email command_line /usr/local/nagios/collate-notifications.pl -t 'SERVICE' -e '$CONTACTEMAIL$' -d '$LONGDATETIME$' -n '$NOTIFICATIONTYPE$' -I '$HOSTADDRESS$' -H '$HOSTNAME$' -S '$SERVICEDESC$' -s '$SERVICESTATE$' -o '$SERVICEOUTPUT$' -A '$HOSTNAME$' -a '$SERVICEDESC$' -f '/usr/local/nagios/var/rw/email-log.txt' } The only setting you need to adjust is -f, which is the path to the same file you specified with -f for the aggregator. Optional: - In both scripts you can set the $loglevel variable. This variable determines how verbose the logging should be, from 0 (none) to 4 (ridiculous). If you turn on logging, make sure to also set the $logfile variable. Of course, if $loglevel is set to 0, that isn't needed. - aggregate-notify needs to have the path to sendmail set; if your sendmail (or whatever mailer) is in a different location make sure you fix that. - If you're having problems with the script, set $debug to 1, which will output all of the errors and such to STDOUT in addition to the log file. Makes it a bit easier to, well, debug. - You can now easily define your own output format... well, semi- easily. No docs are being provided for this part, but if you look at how %fmt works you should see it. If not, well, ask. :) ---------------------------------------------------------------- UPGRADING NANS 0.5 -> 0.6 Just as easy as 0.4 to 0.5. There are some optional variables you can set, but otherwise it's just a drop-in replacement. One note: it now supports grouping notifications by host. To do that, simply slap a "-g" on aggregate-notify. It removes some of the aggregation, so I wouldn't use it everywhere, but in some situations it can be very useful. As there's now the potential to easily redefine or define new output formats, that could potentially change your config file - but that only affects you if you actually define a new format, which is really more of an advanced option. NANS 0.4 -> 0.5 Cakewalk. - Replace the scripts. - Optionally, set the $loglevel and $logfile. (See the OPTIONAL section under INSTALLING.) NANS 0.3 -> 0.4 Upgrading from NANS 0.3 is a pretty simple task. The only "gotcha" is making sure you update your config file correctly. The config file format has changed slightly. The entries in version 0.3 looked like this: email@address.com,count,summary,full An entry in 0.4's config file looks like this: email@address.com,count,minisum,summary,full I know, you're probably annoyed that I'm changing the order of the fields, right? I am too, but I figured in the long run it made more sense for the reports to go from "smallest" to "largest" rather than "first one I wrote" to "last one I wrote". I'd recommend making a new config file, say, nans-0.4.conf, and then editing that. Put the new versions of collate-notifications and aggregate-notify in place, and that's it. There's been no syntax change in them, so they can just be dropped in to replace the old ones. Enjoy! Nicholas Tang ntang -at- nachtwache -dot- org Bob Patterson bpatterson -at- i1ops dot net