In a previous article we explored the implementation mechanisms for monitoring and controlling the power consumed by data center servers. In this article we'll see that an ability to trim the power consumed by servers at convenient time represents a valuable tool to reduce stranded power and take maximum advantage of the power available under the existing infrastructure. Let's start with a small example and figure out how to optimize the power utilization in a single rack.
Forecasting the power requirements for a server over the productâ€™s lifetime is not an easy exercise. Server power consumption is a function of server hardware specifications and the associated software and workloads running on them. Also the serverâ€™s configuration may change over time: the machine may be retrofitted with additional memory, new processors and hard drives. This challenge is compounded by more aggressive implementations of power proportional computing: servers of a few years ago exhibited little variability between power consumption at idle and power consumption at full power.
While power proportional computing has brought down the average power consumption, it also has increased its variance significantly, that is, data center administrators can expect wide swings in power consumption during normal operation.
Under-sizing the power infrastructure can lead to operational problems during the equipmentâ€™s lifetime: it may become impossible to fully load racks due to supply power limitations or because hot spots start developing. This extra data center power capacity needs to be allocated for the rare occasion where it might be needed, but in practice and cannot be used because it is held in reserve, leading to the term "stranded power."
One possible strategy is to forecast power consumption using an upper bound. The most obvious upper bound is to use the plate power, that is, the power in the electrical specifications of the server. This is a number guaranteed to never be exceeded. Throwing power at the problem is not unlike the approach of throwing bandwidth at the problem in network design to compensate for lack of bandwidth allocation capability and QoS mechanisms. This approach is overly conservative because the power infrastructure is designed by adding the assumed peak power for each server over the equipmentâ€™s life time, an exceedingly unlikely event.
The picture is even worse when we realize that IT equipment represents only 30 to 40 percent of the power consumption in the data center as depicted in the figure below. This means that the power forecasting in the data center must not only include the power consumed by the servers proper, but also the power consumed by the ancillary equipment, including cooling, heating and lighting, which can be over twice the power allocated to servers.
Establishing a power forecast and sizing up a data center based on nameplate will lead to gross underestimation of the actual power needed and unnecessary capital expenses. The over-sizing of the power infrastructure is needed as insurance for the future because of the large uncertainty in the actual power consumption forecast. It does not reflect actual need.
Power allocation in the data center.
A more realistic factor is to de-rate the plate power to a percentage determined by the practices at a particular site. Typical numbers range between 40 percent and 70 percent. Unfortunately, these numbers represent a guess representative over a serverâ€™s lifetime and are still overly conservative.
Intel(r) Data Center Manager provides a one year history of power consumption that allows a much tighter bound for power consumption forecasting. At the same time, it is possible to limit power consumption to ensure that group power consumption does not exceed thresholds imposed by the utility power and the power supply infrastructure.
Initial testing performed with Baidu and China Telecom indicates that it is possible to increase rack density by 40 to 60 percent using a pre-existing data center infrastructure.
We will explore other uses in subsequent articles such as managing servers that are overheating and dynamically allocating power to server sub-groups depending on the priority of the applications they run.
Determining Total Cost of Ownership for Data Center and Network Room Infrastructure, APC Paper #6 and Avoiding Costs from Oversizing Data Center and Network Room Infrastructure, APC Paper #37, http://www.apc.com