Data Center Health Management

As winter approaches, I’m sure you’ve already heard more sneezes and coughs circulating throughout the office. Just as regular checkups are important to your health and well-being, the same can be said for your data center’s health. Preventative measures are critical to avoiding outages and downtime.

Just how critical is Data Center Health Management?

Consider the Delta Airlines data center outage that occurred this past August, which grounded more than 2,000 flights over three days and cost the company $150 million. Or the data center outage that Southwest Airlines experienced, which also lasted three days and is estimated to have caused at least $177 million in lost passenger revenue.

That, my friends, is nothing to sneeze about.

In view of the financial prognosis, how could companies not afford to employ a preventative health management approach in their data centers to catch issues before they spiral out of control and are deemed untreatable?

Yes, the numbers cited above are extraordinarily high and point out that major airlines have a lot more at stake when designing and managing critical infrastructure than most other data center operators. But the risks involving outages do not discriminate. All data center facilities across every industry sector run similar risks when left unprotected by a sound health management approach. According to a study by the Ponemon Institute, the average cost of a single data center outage today is about $730,000. Of the 60-plus data center operators surveyed for the study, the costliest outage reported caused the data center operator to lose approximately $2.4 million.

To be certain, today’s data center operators are faced with significant, long-term challenges and daily uncertainties.

Among these: How can they know when a server’s components fail? Is it necessary to manually check the LEDs? How soon can a data center manager anticipate his facility’s fans to fail? Moreover, with thousands of heterogamous servers in the typical data center, there is the need for a tool to control and access these servers to maintain full availability.

Add to that the need to spend exorbitant amounts of money on hardware KVMs as well as to receive failure reports and know without question when it’s necessary to make a service call to remote data centers, and maintaining data center health can become a Sisyphean task.

Providing a remote control for your data center, Intel® Virtual Gateway is a cross-platform, virtual keyboard-video-mouse used for maintaining the health of data center hardware. Given its firmware-based capability that is embedded directly into the server, Intel® Virtual Gateway eliminates the need for complicated and expensive KVM infrastructure.

Health management in the data center has four main pillars: monitoring, analytics, diagnostics and remediation. Let’s take a closer look at the capabilities specific to each of these requirements (all of which are supported by Intel® Virtual Gateway).

Monitoring

  • Provides root cause failures with down-to-components’ health details
  • Creates a failure device report with severity and failure details
  • Using hardware failure trending, can better predict when components will need to be replaced
  • Provides failure rate and MTTR analysis, per server model, components, etc., for the future
  • Provides server failure predication for the future

Diagnostics

  • Produces server diagnostics and troubleshooting
  • Checks BIOS settings and BIOS configuration
  • Analyzes server logs
  • Makes configuration changes or verification
  • Uses both OOB (KVM) and IB (SSH, RDP, VNC)

Remediation

  • Can remotely power servers on and off
  • Provides the ability to create groups of servers and then assigns power tasks to them
  • Can stagger turn-on to keep from overloading racks
  • Can schedule and automate and individual or group power task
  • Provides vMedia for remote OS provisioning and installation
  • Links server failures to workload and/or workflow management system for IT

Through ongoing monitoring, analytics, diagnostics and remediation, data center operators can employ a health management approach to addressing the risk of costly downtime and outages. Think of Intel® Virtual Gateway as “an apple a day” for the health and well-being of your facility.

Published on Categories Data CenterTags ,
Jeff Klaus

About Jeff Klaus

General Manager of Data Center Solutions at Intel. Internationally respected software executive with experience building data center software licensing, API management and software solution businesses. Jeff has extensive experience building software engineering, product development, marketing, licensing and deployment through a variety of industry verticals globally. Jeff has experience distributing solutions to the top 10 global hardware OEMs, leading global software solution providers and direct to the largest telco and Internet Portal Data Centers around the world. He has built global sales and distribution teams and has experience orchestrating solution selling through indirect solution partners in addition to direct GTM strategies. Jeff is a graduate of Boston College, and also holds an MBA from Boston University.