/nans

Nagios Aggregate Notification System

Primary LanguagePerl

================================================================
         NAGIOS AGGREGATE NOTIFICATION SYSTEM: README
================================================================
     (c) 2001 - Nicholas Tang (ntang at nachtwache dot org)
      Updated - Bob Patterson (bpatterson at i1ops dot net)
               Under the Gnu Public License v2.0
              http://www.gnu.org/copyleft/gpl.txt
----------------------------------------------------------------

INTRODUCTION

The Nagios Aggregate Notification System (NANS) is designed to 
replace the existing 1-line shell script that currently handles 
notifications in Nagios (http://www.nagios.org/).  It is a 
drop-in replacement that doesn't require any changes to the 
existing Nagios configuration other than telling it to use NANS
instead of the current system.  It is configurable on per-contact 
basis and allows for different levels of aggregation for epager 
vs. email notifications.

It was designed to cut back on the flood of notifications that can 
occur at a larger Nagios installation; I have personally had as 
many as 13 pages get sent to me in a one minute period when we had 
a problem on the network, and unfortunately not every service or 
host can easily be assigned a parent - and even if it has one, until 
Nagios notices the parent is unavailable, it will still report a 
problem with the children.

The latest version should always be available from my homepage at 
http://www.nachtwache.org/projects/ (under nagios/ utilities/ 
nans/ most likely).

----------------------------------------------------------------

INSTALLATION

(Note: if upgrading from an old version, skip down to the next 
section, cunningly labelled "UPGRADING".)

Installation is a simple matter.  First off, put the two scripts 
(aggregate-notify.pl and collate-notifications.pl) on your system - 
I use Nagios's bin directory since it's mostly empty.  They can 
go anywhere, as long as the Nagios user has read/execute privs 
for them.

The next step is to create the configuration file(s).  You can 
have one per instance, or one shared one, or whatever.  I use one 
shared one, but some places might find it convenient to have 
different settings for different situations.

SAMPLE CONFIG FILE
# This is a comment
foo@bar.com,1,0,1,0
billybob@bar.com,1,1,0,1

The example above will give user foo the count and summary reports,
and give the user billybob the count, mini-summary and detailed reports.

The config file is simple.  Lines with a # at the beginning are 
comments (note, the "#" must be the FIRST character on the line).
All other lines are considered config lines.  I will likely make it 
a little more lenient in the next version, but for now, deal.  ;)

Each line should have 5 comma-delimited fields (no spaces).  The 
first field is the contact email address, and fields 2-5 are simple 
on/off flags (0=off, 1=on) that determine which types of reports 
should be sent for that contact: a simple count, a mini summary, 
a more detailed summary, or the full details.  Note that the summary
and full details are mostly identical in terms of the information 
they provide, the summary is simply more compact.  The mini summary 
is much more compact and strips out more information.



Next, you need to add entries in your crontab for the aggregate 
mailer.  There are four settings for it:
-t: the frequency, in "time units", to send the emails
-f: the full path to the notification log (generated by the other tool)
-c: the full path to the config file (discussed above)
-g: turns on grouping by host

I say "time units" rather than minutes, because it actually stands 
for the number of times it should run before it will actually 
send out the notifications.  If you run it every minute from cron, 
it will be in minutes.  Run it every 5 minutes, and a single unit 
will be 5 minutes.

Here's an example:

* * * * * /usr/local/nagios/bin/aggregate-notify.pl -t 2 -c '/usr/local/nagios/etc/notification-page.cfg' -f '/usr/local/nagios/var/rw/page-log.txt'
* * * * * /usr/local/nagios/bin/aggregate-notify.pl -t 10 -c '/usr/local/nagios/etc/notification-email.cfg' -f '/usr/local/nagios/var/rw/email-log.txt'

If it's set up correctly, it should create a single-line file 
where you specified -f, and that line should be a comment with the 
current "ttm" value (time to mail ;) ) and the timer value you set 
with -t.

Finally, you need to set up the notification collator.  To do this, 
you need to configure Nagios to use it instead of the existing script.
Replace the existing notify-by-email, notify-by-epager, 
host-notify-by-email, and host-notify-by-epager lines with the 
following, modifying the paths to match your install, of course:

define command{
    command_name    notify-host-by-email
    command_line    /usr/local/nagios/collate-notifications.pl -t 'HOST' -e '$CONTACTEMAIL$'  -d '$LONGDATETIME$' -n '$NOTIFICATIONTYPE$' -I '$HOSTADDRESS$' -H '$HOSTNAME$' -h '$HOSTSTATE$' -o '$HOSTOUTPUT' -A '$HOSTNAME$' -a '$HOSTNAME$' -f '/usr/local/nagios/var/rw/email-log.txt'
}
define command{
    command_name    notify-service-by-email
    command_line    /usr/local/nagios/collate-notifications.pl -t 'SERVICE' -e '$CONTACTEMAIL$' -d '$LONGDATETIME$' -n '$NOTIFICATIONTYPE$' -I '$HOSTADDRESS$' -H '$HOSTNAME$' -S '$SERVICEDESC$' -s '$SERVICESTATE$' -o '$SERVICEOUTPUT$' -A '$HOSTNAME$' -a '$SERVICEDESC$' -f '/usr/local/nagios/var/rw/email-log.txt'
}


The only setting you need to adjust is -f, which is the path to the same 
file you specified with -f for the aggregator.

Optional:

- In both scripts you can set the $loglevel variable.  This variable 
determines how verbose the logging should be, from 0 (none) to 4 
(ridiculous).  If you turn on logging, make sure to also set the 
$logfile variable.  Of course, if $loglevel is set to 0, that isn't 
needed.
- aggregate-notify needs to have the path to sendmail set; if your 
sendmail (or whatever mailer) is in a different location make sure 
you fix  that.
- If you're having problems with the script, set $debug to 1, which 
will output all of the errors and such to STDOUT in addition to the 
log file.  Makes it a bit easier to, well, debug.
- You can now easily define your own output format... well, semi-
easily.  No docs are being provided for this part, but if you look
at how %fmt works you should see it.  If not, well, ask.  :)

----------------------------------------------------------------

UPGRADING

NANS 0.5 -> 0.6

Just as easy as 0.4 to 0.5.  There are some optional variables 
you can set, but otherwise it's just a drop-in replacement.

One note: it now supports grouping notifications by host.  To 
do that, simply slap a "-g" on aggregate-notify.  It removes 
some of the aggregation, so I wouldn't use it everywhere, but 
in some situations it can be very useful.

As there's now the potential to easily redefine or define new 
output formats, that could potentially change your config file - 
but that only affects you if you actually define a new format, 
which is really more of an advanced option.

NANS 0.4 -> 0.5

Cakewalk.

- Replace the scripts.
- Optionally, set the $loglevel and $logfile.  (See the OPTIONAL
  section under INSTALLING.)

NANS 0.3 -> 0.4

Upgrading from NANS 0.3 is a pretty simple task.  The only "gotcha" is 
making sure you update your config file correctly.

The config file format has changed slightly.  The entries in version 
0.3 looked like this:

email@address.com,count,summary,full

An entry in 0.4's config file looks like this:

email@address.com,count,minisum,summary,full

I know, you're probably annoyed that I'm changing the order of the fields, 
right?  I am too, but I figured in the long run it made more sense for the 
reports to go from "smallest" to "largest" rather than "first one I wrote" 
to "last one I wrote".

I'd recommend making a new config file, say, nans-0.4.conf, and then 
editing that.  Put the new versions of collate-notifications and
aggregate-notify in place, and that's it.  There's been no syntax change 
in them, so they can just be dropped in to replace the old ones.

Enjoy!

Nicholas Tang
ntang -at- nachtwache -dot- org

Bob Patterson
bpatterson -at- i1ops dot net