Building Technical Operations: My list of favorite Nagios check scripts

Nagios is a great monitoring tool - I used it to monitor networks with hundreds of hosts and thousands of service checks. One of the biggest Nagios advantages is a wide set of service check scripts available for public use, and the following two sites provide a great collection of the scripts:

What is confusing in Nagios is that there are of lot of different scripts from different authors doing more or less the same thing (checking the same stuff), and a system administrator deploying a new instance of Nagios should invest a lot of time selecting the best scripts from the wide variety of available solutions.

The goal of the post is to share with you my list of favorite Nagios check scripts, so next time you will need to deploy a Nagios instance just use the page as a reference for requires check modules.

Basic server parameters
HTTP service - Nagios' standard check module "check_http"

ICMP reachebility - Nagios' standard check module "check_ping"

CPU Usage and Load Average for many platforms (Linux/Cisco/HP/Fortigate/etc) - check_snmp_load.pl

Disk Space Usage - check_snmp_storage.pl

RAM and Swap Usage - check_snmp_mem.pl

Remote monitoring of NTP service - Nagios' standard "check_ntp_peer" check module

MySQL service monitoring - Nagios' standard plug-in "check_mysql"

Graphing of collected monitoring data - Nagiosgraph

Running of a Nagios check script on remote servers using SNMP extend feature - http://www.logix.cz/michal/devel/nagios/

Monitoring of Linux bonding interfaces (using SNMP extend feature) - check_linux_bonding

Monitoring of Linux software MD RAID volumes (using SNMP extend feature) - check_md_raid.sh

Monitoring of Linux IPMI status - check_ipmi_sdr_ok.pl

Check power/fans/temperature for many types of hardware (Cisco/Linux/HP/etc.) - check_snmp_environment.pl

Tomcat server memory and thread usage monitoring - check_tomcat.pl

Redis server status monitoring - check_redis.pl

DRBD devices status - check_drbd

LSI MegaRAID card SNMP monitoring
Use this script if you were able to find and deploy an SNMP package suitable for your LSI MegaRAID card - http://wleibzon.bol.ucla.edu/nagios/plugins/check_sasraid_megaraid.pl.

Network interface status monitoring
The following script has been tested by me to work great for both Cisco and Net-SNMP - http://wleibzon.bol.ucla.edu/nagios/plugins/check_snmp_netint.pl. You can easily monitor interface up/down status, current bandwidth usage, error/discard counters

Dell server hardware status monitoring
You may use Dell OpenManage Server Administrator and check_openmanage plugin to easily monitor many Dell server hardware parameters

Ganglia
Python script "/contrib/check_ganglia.py" included in Ganglia's source package provides a very simple way to monitor any Ganglia's metric - a great way to monitor many Hadoop parameters.

I'll keep updating the article with my new Nagios check scripts findings, so don't forget to bookmark the page.

Building Technical Operations

Monday, September 12, 2011

My list of favorite Nagios check scripts

No comments:

Post a Comment