Recently in our test lab, we experienced a cooling failure... and I wasn't even sitting in the lab to realize it. In fact, I wasn't in the same state!
With the recent launch of the Xeon 5500 Series servers - I have been testing some use-cases against four of our servers in our lab when I noticed that the temperature was rising pretty drastically in there. How did I see this? Using Intel® Intelligent Power Node Manager embeddd in our Xeon Servers and using our Intel Data Center Manager (DCM) SDK software interface - the data is presented in a visual format.
In the graph above, the dark colored line is the "front panel inlet" temperature, and in a matter of minutes, the temperature in the lab rose from 71F to 87F - 16 degrees! What I didn't have setup is the scenario is a power policy that activates on a thermal trip. Here is how you would setup this policy in Data Center Manager under the Policies section for this rack:
In the event that a thermal event occurred that would cause the room to heat up to 78F (as shown above) - Intel DCM would send the IPMI commands to the platform which in turn would tell the Node Manager firmware to throttle-back the Xeon CPUs to their lowest P-state possible. This would reduce energy consumed across the systems in the policy group as well as reduce the thermal output of each server. This would in turn generate less heat across the servers thereby reducing the load placed on an already overheated lab or datacenter.
This gives the server managers more time to gracefully shutdown systems, and/or move the workloads to cooler sections of the datacenter. If you have ever experienced a cooling failure in the datacenter, it's a usually a frenzy to shutdown machines to minimize heat and/or power utilization overall. This thermal policy can give you more time before you reach a critical temperature where you start losing components, servers and ultimately - loss of data and productivity.
Using standard the standard IPMI interface, the Data Center Manager SDK and Node Manager on the Xeon 5500 series platform enable power monitoring, power management, and front panel inlet monitoring. This gives a server/datacenter manager the capcity to measure power usage per server, where you'd have to previously have more expensive power measurement tools. External power meters cost anywhere from a cheap $15 to spendy $1000 - but now the technology is embedded into the firmware on the machine.
You can learn more about the Xeon 5500 Series Processors on the Intel Xeon website.