Optimizing Power Performance in Virtualized Cloud Server Pools

Two approaches are commonly applied to reduce lighting power consumption in residential or commercial buildings: turning lights off and using dimming mechanisms.

Turning lights off yields the greatest power savings, assuming the room is not to be used.  There is still a small amount of residual power being drawn to power pilot lights or motion sensors to turn on the illumination if someone enters the room.

Dimming the lights reduces power consumption when a room is in use it is possible to reduce the illumination level while allowing people to occupy the room for the intended purpose.  For instance, illumination in certain areas may not be needed because mixed daylight is in use, zonal lighting on work areas is sufficient, or because the application calls for reduced lighting, such as in a restaurant or dining room.  Power saved through dimming will be less than turning lights off.

Similar mechanisms are available in servers deployed in data centers. Servers can be shut down and restarted under application control when not needed.  We call the action of shutting down servers for power management purposes server parking.  This is the equivalent of turning lights off in a room.  The capability for “dimming lights” in a server is embodied by the Intel® Enhanced SpeedStep® technology or EIST and Intel® Intelligent Power Node Manager technology or Node Manager.  EIST reduces power consumption during periods of low workload and Node Manager can cap power, that is, reduce power consumption at high workload levels under application control.

In tests performed at our lab, the 2-socket white box Urbanna server provisioned with Intel® Xeon® 5500 Series processors, 6 DIMMs and one hard drive have a power consumption of about 50 percent of the power consumption at full load, about 150 watts out of 300, this is when the effect of EIST.  If the server is working under full load, the 300 watts consumed at full power can be reduced by about 30 percent down to 210 watts or so. 

There is a “dimming” effect from power capping due to the voltage and frequency scaling mechanism used to implement power capping.  However, the tradeoff between performance and power consumption is more complex than the relationships in the lighting example. If the server is not working at full load, there may be enough cycles left in the server to continue running the workload without an apparent impact on performance.   In this case, the penalty is in the amount of performance headroom available should the workload pick up.  The solution to this problem is simple.  If the extra headroom is called for, the management application setting the cap can remove it and the full performance of the server becomes available in a fraction of a second.

There is also a richer set of options for turning off servers than there are for turning lights off.  The ACPI standard defines at least three states suitable for server parking: S3 (sleep to memory), S4 (hibernation where the server state is saved in a file) and S5 (soft off, where the server is powered down except for the circuitry to turn it on remotely under application control.)  The specific choice depends on hardware support; not all states are supported by a specific implementation.  It also depends on application requirements.  A restart from S3, if supported by the hardware, can take place much faster than a restart from S5.  The tradeoff is that S3 is somewhat more energetic than S5 because of the need to keep the DIMMs charged.

A widespread use of server parking is not feasible with traditional where a hard binding exists between the application components and the hardware host because bringing any of the hosts offline could cripple the application.  This binding gets relaxed for virtualized cloud environments that support dynamic consolidation of virtual machines into a subset of active hosts.  The sub-pool of active hosts is grown or shrunk for optimal utilization levels.  Vacated hosts are parked, the equivalent of turning lights off in a room, and as in the lighting example, once a server is in parked state the server can’t run applications.

Unlike branch circuits used for lighting where the workload is sized to never exceed the circuit’s capacity, branch circuits feeding servers may be provisioned close to capacity.  One possible application for Node Manager is to establish a guard rail for power capping to kick in if the power consumption gets close to the limit.