I have a love/hate relationship with Nagios. On the one hand, I love how much power there is in the object oriented approach to the configuration files. On the other hand, it takes a long time to figure out what's going on and how to fix them.

I wish there was a nagios command that would print out exactly how each service, host, etc. is inheriting. Like the postconf command which spits out all your configuration variables, including the default ones, that are currently active.

In any event...

The goal of my latest change was to stop nagios from bothering us every time a packet was delayed in our backup data center. Because we run a bunch of high volume backups at the same time in the night, we get lots of delayed packets.

All of our servers use a generic template with the check_command set to check-host-alive, which will notify us if the host is outright dead.

In addition, we have a service, in which all servers are a member, called ping-servers. This service checks for ping latency. I've now turned notification off for this service, so I don't get woken up in the middle of the night - yet we still get to see a report on latency when logging into nagios.

This now applies to all of our servers. I think we'll be in a stronger position if we only get notifications for real emergencies rather than a high volume of notifications for delayed packets, which is good to know (and Nagios will keep track for us) but does not always require immediate action on our part.