Sunday, June 16, 2013

What to monitor on a Linux box

This article is a kind of reminder for me (and for anyone else managing a monitoring system) about which metrics should/can be monitored for different kind of hardware and software components in a modern production system. The listed metrics primarily assume the usage of Nagios, Graphite, collectl and logcheck monitoring tools.

Metrics, metrics, metrics...

A lot has been said about the importance of system and application metrics - I'll not repeat this, and will concentrate on of-the-shelf options available to implement a robust, usable and scalable metrics collection and monitoring system.

We will talk about three main areas:
  1. How to generate/create metrics
  2. How to collect and represent the metrics
  3. How to monitor the collected metrics