Here is a collection of various tools that I wrote or adapted to my needs to ease the task of Nagios monitoring of various services mostly on Linux servers.
SNMP daemon version 5.0 and above from the NetSNMP
project provides a way to access output of user supplied scripts via
SNMP protocol. In other words: SNMP client on one machine can invoke
a script on another machine just by sending a SNMP query. After the remote
script finishes its standard/error output, return code and some other
values are sent back to the client in a SNMP response.
(NOTE: See SNMP exec section below if you run older SNMP daemon than NetSNMP 5.0)
For example - consider you want to query an actual date and time
of a server. Indeed, there are some standard OIDs in the System MIB, but
you can as well run /bin/date
every time and pass its output
through SNMP back to the client. Here is how to do it:
date
extension in /etc/snmp/snmpd.conf
. Simply add this single line at the end of the config file and reload snmpd:
extend datecheck /bin/date
~$ snmpwalk -v2c -c public remote.server NET-SNMP-EXTEND-MIB::nsExtendOutputFull NET-SNMP-EXTEND-MIB::nsExtendOutputFull."datecheck" = STRING: Wed Oct 18 00:01:44 NZDT 2006That's about it. Easy way to run programs and scripts remotely, isn't it?
For integration with Nagios save
this script: check_snmp_extend.sh
to /usr/local/nagios/libexec.local
and put the following lines into Nagios' config:
$USER10$=/usr/local/nagios/libexec.local define command{ command_name check_snmp_extend command_line $USER10$/check_snmp_extend.sh $HOSTADDRESS$ $ARG1$ } define service{ ## This is an example service configured as ## extend servicename /path/to/service-check.sh ## on remote.server in /etc/snmp/snmpd.conf use generic-service host_name remote.server service_description SomeService status check_command check_snmp_extend!servicename }
Have you found this script useful? Please support author by PayPal donation.
SNMP exec provides a similar functionality to extend, however exec is less flexible and slightly slower to work with. On the other hand it is supported in many older implementations of SNMP daemons including UCD-SNMP and NetSNMP 4.x which are still found on many servers.
The configuration and operation of exec is very similar to the above
described extend. I won't repeat myself - simply replace all strings
extend with exec in the above Nagios config file and
download this script to Nagios' libexec.local
directory:
check_snmp_exec.sh
Your Nagios is now ready to query status from remote scripts via SNMP. The conventions for these scripts are fairly trivial - the first word on the first line must be either OK or WARNING or FAIL or UNKNOWN. This word is translated to the appropriate return code by check_snmp_extend.sh or check_snmp_exec.sh and returned back to Nagios. That's all.
BTW from now on I will talk about extend only but everything entirely applies to exec on older daemons as well.
Now it's a good time to introduce some scripts that can be used on the remote servers with extend or exec...
Most low end servers rely on Linux SW-RAID for their data storage. Monitoring such array and getting an alert as soon as a problem appears is an essential part of maintaining a decent availability of your services.
The core part of this monitor is a nagios-linux-swraid.pl script
that parses RAID status information from /proc/mdstat
and
reports a single line similar to the following on its standard output:
OK - md0 [UU] has 2 of 2 devices active (active=sda2,sdb2 failed=none spare=none)
This script is hooked to SNMP daemon process running on the
host with SW-RAID using a simple line in /etc/snmp/snmpd.conf
config file:
extend raid-md0 /usr/local/bin/nagios-linux-swraid.pl --device=md0
On the Nagios side I assume you have check_snmp_extend correctly configured as described earlier on this page. Then simply add the following service description to your Nagios' config:
define service{ use generic-service host_name remote.server service_description RAID status check_command check_snmp_extension!raid-md0 }
Have you found this script useful? Please support author by PayPal donation.
Keeping operating system up to date with latest patches is essential in most environments. Following two scripts check whether new patches are available for download. The first one works with APT based systems (e.g. Debian, Ubuntu or even OpenSolaris Nexenta, ...) and the second one is for YUM based systems (e.g. RedHat, Fedora or CentOS).
You can make the script itself to run apt
or
yum
to download information about newest updates from the
internet, but that usually takes too long time and Nagios usually in the middle timesout. Instead
on my systems I run apt
or yum
every few
hours from cron and let the Nagios script only do the
quick tasks. More specifically - on APT systems I run
apt-get update
from cron, because that download data
from the net and then run apt-get -q -s upgrade
from SNMP script because that only reads local database and parses the
output. On YUM systems I run yum check-update
from
cron and store its output in a file. Then from SNMP script I only read
and parse that file. Keep reading for example usage.
First of all tell cron to run apt
or yum
every 6 hours:
## /etc/crontab ## Every 6 hours download new data from the net # For APT-based systems (Debian, Ubuntu, ...) 50 */6 * * * root /usr/bin/apt-get -qq update # For YUM-based systems (RedHat, CentOS, Fedora) 50 */6 * * * root /usr/bin/yum check-update > /var/run/yum.check-update
Download the nagios scripts (check-apt-upgrade.pl or check-yum-update.pl ) and tell SNMP daemon about them:
## /etc/snmp/snmpd.conf # For APT-based systems (Debian, Ubuntu, ...) extend sw-updates /usr/local/bin/check-apt-upgrade.pl --run # For YUM-based systems (RedHat, CentOS, Fedora) extend sw-updates /usr/local/bin/check-yum-update.pl --file /var/run/yum.check-update
Last step is to configure Nagios to poll these services (same for both APT and YUM):
define service{ use generic-service host_name remote.server service_description Software updates check_command check_snmp_extension!sw-updates }
That's all. You will get "OK" result if there are no new updates and "WARNING" with a list of packages to update if there are any. On Ubuntu you'll even get "CRITICAL" result if there are any security updates, and "WARNING" when there are only non-security ones.
Have you found this script useful? Please support author by PayPal donation.
Reporting system uptime and generating alert on system reboot is often useful. Download check_snmp_uptime.pl that does just that - reads system uptime, records last reading and alerts when current uptime reading is lower than the last recorded one. Easy, eh?
Two important things:
DISMAN-EVENT-MIB::sysUpTimeInstance
(.1.3.6.1.2.1.1.3.0)
The real system uptime is available as
HOST-RESOURCES-MIB::hrSystemUptime.0
(.1.3.6.1.2.1.25.1.1.0)
Many other devices like switches provide only the former OID.--sysUpTime
or
--hrSystemUptime
to select the appropriate OID for each device.
--dbfile
parameter is not used then the script will only
check and report the uptime and return OK. No alerts will be generated at all.
Have you found this script useful? Please support author by PayPal donation.
MySQL provides a relatively easy way to run online mirrors of the master database. The mirror is called slave in MySQL terminology and the mirroring process is called replication. With the following script it is easy to add your MySQL slaves to Nagios monitoring and receive an alert whenever replication drops for some reason.
The script needs to connect as a MySQL user (say user monitor) with privilege REPLICATION CLIENT. Use this GRANT command to create such a user:
mysql> GRANT REPLICATION CLIENT ON *.* TO monitor@localhost IDENTIFIED BY 'PassWord';
Then download the script check-mysql-slave.pl ,
save it to /usr/local/bin/
directory and tell SNMP daemon
to run it on request:
extend mysql-slave /usr/local/bin/check-mysql-slave.pl --user monitor --pass PassWord --sock /tmp/mysql.sock
Last step, similar to all other monitors, is to configure Nagios to poll the monitor status every now and then:
define service{ use generic-service host_name remote.server service_description MySQL replication check_command check_snmp_extension!mysql-slave }
The script returns "CRITICAL" whenever replication breaks for some reason, e.g. Slave IO or Slave SQL process doesn't run, or the replication is too far behind master (2 minutes by default). "WARNING" is returned when replication is more than 1 minute but less then 2 minutes behind master and "OK" is returned when everything goes all right. You can also get "UNKNOWN" return code, usually on misconfiguration or when the script can't connect to the slave server. That's all ;-)
Have you found this script useful? Please support author by PayPal donation.
Slony is a popular PostrgreSQL replication system. It is a good idea to monitor the status of your Slony cluster and trigger an alert whenever any node gets out of sync. The following script will help you do just that.
libexec
directory on your Nagios server.nagios
.define command{ command_name check_slony command_line \$USER1\$/check-slony-cluster.pl --host \$HOSTADDRESS\$ --user \$ARG1\$ --dbname \$ARG2\$ --cluster \$ARG3\$ --node \$ARG4\$ } define service{ use generic-service host_name dbmaster service_description Slony - node 2 dbslave check_command check_slony!nagios!mydb!mydbcluster!2 }Update the hostname, username, database name and cluster name to suit your setup.
Many Nagios scripts exist for checking core system values on Linux systems. These are the ones I use...
Note that I'm not the author of these four scripts below...