Friday, September 30, 2011

Handling of external monitoring alerts

If you have an Internet-facing production system it is always wise to use an external web availability and/or performance monitoring service like Pindom or Gomez/Keynote/Catchpoint to monitor the exposed resources. All the services support email alerting, and some even provide SMS alerting. As I already explained here I recommend using SMS alerting for critical monitoring events, but managing individual SMS alerting configurations in each of used external monitoring system does not scale.


I have a quite good experience using the following solution based on using a ticket management with email interface (I'm a big fan of RT - Request Tracker software by Best Practical):
  • Create a new RT queue named, for example, "Network_Alerts", and configure email access to the queue (you may use email address like "network_alerts@rt.company.net")
  • Create a simple Perl script which will regularly check the status of the queue, and alert by SMS to a configured list of recipients if a new ticket in the queue is not handled within defined time frame (like 1-2 minutes). RT provides quite flexible Perl API, and you can easily write the described SMS escalation script using script like rt-reminder as an example
  • Configure all your external monitoring systems to send email alerts to "network_alerts@rt.company.net" address

Once your external monitoring system will detect a problem in the service, it will send an email alert to the configured RT email address, and a new ticket will be automatically opened in "Network_Alerts" queue. If no one from your Operations/NOC team will take the ticket within the defined escalation timeout, the script will start sending SMS messages to the configured escalation list until the ticket will be not handled.

Should you need to make any changes in the escalation procedure or contact details you will need to modify only the configuration of the SMS escalation script.

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. What software could we use for monitoring networks on Linux and Windows systems? Maybe Total Network Monitor will be good ? We are already using Total Network Inventory for our PC management.

    ReplyDelete