Thursday, October 13, 2011

Remote Control for Your Production Site

Have you ever found yourself rushing to the office data center in the middle of the night just because a critical server is down, and you don't have any means to remotely access the server's console or reset the power? How much time and money have you spent on line with data center technicians trying to recover critical pieces of your production infrastructure located at a remote colocation facility?

This post will explain which remote management options are available for you, and how to organize remote management of your critical equipment.

Light-Out-Management interfaces
Most server vendors provide so called Light-Out-Management (LOM) facility, which is well described in this Wiki articleFor example, HP provides HP Integrated Lights-Out technology, Dell - iDrac, IBM System x Series - Integrated Management Module 2, Supermicro -Intelligent Management.

LOM facilities provide two important services:
  1. Remote access to the server's graphical console
  2. Remote management of the server's power status (power off/on/reset)
The access to a LOM interface is normally provided using a dedicated RJ45 network port on the server.

Graphical LOM interfaces are very convenient to use and provide access to the server's hardware heath reports like the event log and status of power supplies, fans and temperature.

The disadvantage of a graphical LOM console is that to access it, you need a stable Internet connection and a browser with enabled Java virtual machine to run a custom console access applet. While this method is quite convenient to manage servers in a closet next to your work room, this may significantly increase the server access/recovery time when you are on the go with a weak cellular Internet connection or mobile device like iPad. Also, to connect LOM network interfaces, you will need to provide enough ports on a dedicated Ethernet switch (a cheap 100Mbps box will work just fine).

In some cases it is possible to order server equipment without a LOM facility (card), and this may decrease the hardware cost by 50-150 USD.

Serial console ports
Almost all servers have a DB9 serial console port (COM1) and a BIOS option for console redirection to the serial port. By using a serial terminal connected to the console port, it is possible to get text-only access to the server's BIOS settings, manage the server's boot status (disk, CD-ROM or network boot) and, when the OS is properly configured to run a system console on the serial port, get access to the server's OS.

To access your servers using a text-based serial console you will need to deploy a dedicated Console Port Server (CPS). The devices are manufactured by many vendors, and the following are some examples of such equipment:

Network equipment like switches and routers is normally equipped with serial console ports, and does not provide any remote power management features. Simple network peripherals like cable/ADSL modems and media converters usually do not have console ports nor remote power management (and the boxes get stuck from time to time and require a power reset to operate again).

Managed PDUs
Remote power management of network equipment and servers without a LOM interface is performed using managed (also called "switched") PDUs. The following are some vendors of switched PDU equipment:

Managed PDUs usually have a serial console port available (in additional to a network port) to access the PDUs in situations where the network management port is down or not configured.

In situations where you don't need a managed PDU to remotely manage the power of your equipment (like in case of servers with LOM interfaces), you still need to use metered PDUs to prevent a circuit breaker flip off from a power overload.

Some data centers provide managed PDUs as a part of their colocation services, and this is quite convenient for small deployments.

Out-Of-Band management
In network administration the term of out-of-band (OOB) management is used to describe the method of accessing a remote system when primary network access is not possible for a reason (for example, primary Internet provider failure or border network equipment outage).

There are two known for me methods how to organize OOB access to your remote site:
  1. Additional low-bandwidth Internet link connected directly to the console server, or a dedicated OOB router - directly to the LOM network. The redundant Internet link should preferably be provisioned not by your primary Internet provider (otherwise there is no redundancy on the Internet provider level). Some data center facilities provide a 5-10Mbps Internet link as a part of colocation service bundle
  2. A regular POTS phone line connected via a modem to the console server can also serve as a solution. Some data centers can provide a POTS line directly to your cabinets. In some facilities you will need to separately order a  POTS line from a local phone provider and a Cat5 cross-connect from the data center to bring the line to your equipment. You also may want to ask the telephone company to disable outbound calls from the phone (or at least block international access)

Which remote management facility to use?
When selecting a method for remote management of your equipment, you usually have the following options:

For deployments with LOM-enabled servers and a few network devices:
  • Use small managed PDUs for network devices
  • Use meter PDUs for servers
  • Connect servers' LOM interfaces to a dedicated LOM switch, and use the interfaces for remote console access and power management of the servers
  • Connect network devices' serial console ports to serial ports on a few servers, and use the servers as console servers to access the connected network equipment
  • Use a separate Internet connection for OOB access to the equipment (via a dedicated router connected directly to the LOM network)

For deployments with a lot of network devices or servers without LOM interfaces:
  • Use managed PDUs for remote power management of all deployed equipment
  • Use a serial console server for remote console access to both server and network equipment
  • For OOB access, use a separate Internet link or POTS line connected directly to the console server

In some situations it may be more appropriate to use a mix of the described approaches.

Monitoring
All LOM network connections and out-of-band access should be properly monitored, and detected problems should be fixed in a timely manner (however with lower priority than in-band issues) - you don't want to figure out that your modem connection is down at the moment when you need to urgently access your production site! To test a phone OOB connection you may want to get a phone line with a modem connected to an office server, and write a simple script to periodically (once in a day is enough) dial the OOB number and verify that the modem can connect to the remote modems.

I hope that the provided information will help you build more reliable and manageable systems.

No comments:

Post a Comment