I would like to elaborate on the topic energy vs. power management in my previous entry.
Upgrading the electrical power infrastructure to accommodate additional servers is not an option in most data centers today. Landing additional servers at a facility that's working at the limit of thermal capacity leads to the formation of hot spots, this assuming that electrical capacity limits are not reached first with no room left in certain branch circuits.
There are two types of potentially useful figures of merit, one for power management and one for energy management. A metric for power management allows us to track operational "goodness", making sure that power draw never exceeds limits imposed by the infrastructure. The second metric tracks power saved over time, which is energy saved. Energy not consumed goes directly to the bottom line of the data center operator.
To understand the dynamic between power and energy management let's look at the graph below and imagine a server without any power management mechanisms whatsoever. The power consumed by that server would be P(unmanaged) regardless of any operating condition. Most servers today have a number of mechanisms operating concurrently, and hence the actual power consumed at any given time t is P(actual)(t). The difference P(unmanaged) - P(actual) is the power saved. The power saved carried over time t(1) through t(2) yields the energy saved.
Please note that a mechanism that yields significant power savings may not necessarily yield high energy savings. For instance, the application of Intel(r) Dynamic Power Node Manager (DPNM) can potentially bring power consumption by over 100 watts, from 300 watts at full load to 200 watts in a dual-socket 2U Nehalem server that we tested in our lab. However, if DPNM is used as a guard rail mechanism, to limit power consumption if a certain threshold is violated, DPNM may never kick in, and hence energy savings will be zero for practical purposes. The reason why we do this is because DPNM works best only under certain operating conditions, namely high loading factors, and because it works through frequency and voltage scaling, it brings a performance tradeoff.
Another useful figure of merit for power management is the dynamic range for power proportional computing. Power consumption in servers today is a function of workload as depicted below:
The relationship is not always linear, but the figure illustrates the concept. On the x-axis we have the workload that can range from 0 to 1, that is, 0 to 100 percent. P(baseline) is the power consumption at idle, and P(spread) is the power proportional computing dynamic range between P(baseline) and power consumption at 100 percent workload. A low P(baseline) is better because it means a low power consumption at idle. For a Nehalem-based server, P(baseline) is roughly 50 percent of power consumption at full utilization, which is remarkable, considering that it represents a 20 percent over the number we observed for the prior generation, Bensley-based servers. The 50 percent figure is a number we have observed in our lab for a whole server, not just the CPU alone.
If a 50 percent P(baseline) looks outstanding, we can do even better for certain application environments such as load-balanced front end Web server pools and the implementation of cloud services through clustered, virtualized servers. We can achieve this effect through the application of platooning. For instance, consider a pool of 16 servers. If the pools is idle, all the servers except one can be put to sleep. The single idle server is consuming only half the power of a fully loaded server, consuming one half of one sixteenth of the cluster power. The dormant servers still draw about 2 percent of full power. Hence, after doing the math, the total power consumption for the cluster at idle will be about 8 percent of the full cluster power consumption. Hence for a clustered deployment, the power dynamic range has been increased from 2:1 for a single server to about 12:1 for the cluster as a whole.
In the figure below note that each platoon is defined by the application of a specific technology or state within each technology. This way it is possible to optimize the system behavior around the particular operational limitations of the technology. The graph below is a generalization of the platooning graph in the prior article. For instance, a power capped server will impose certain performance limitations to workloads, and hence we assign non time critical workloads to that platoon. By definition, an idling server cannot have any workloads; the moment a workload lands on it it's no longer idle, and its power consumption will rise.
The CPU is not running in any of the S-states than S0. The selection of a specific state depends on how fast that particular server is needed online. It takes longer to bring up a server online in the lower energy states. Servers in G3 may actually be unracked and put in storage for seasonal equipment allocation.
A virtualized environment makes it easier to rebalance workloads across active (unconstrained and power capped) servers. If servers are being used as a CPU cycle engines, it may be sufficient to idle or put to sleep the subset of servers not needed.
The extra dynamic power range comes at the expense of instituting additional processes and operational complexity. However, please note that there are immediate benefits in power and energy management accrued through a simple equipment refresh. IBM reports an 11X performance gain for Nehalem-based HS22 blade servers versus the HS20 model only three years old. Network World reports a similar figure, a ten-fold increase in performance, not just ten percent.
I will be elaborating on some of these ideas at the PDCS003 Cloud Power Management with the Intel(r) Nehalem Platform class at the upcoming Intel Developer Forum in San Francisco on the week of September 20th. Please consider yourself invited to join me if you are planning to attend this conference.