Exascalar and Cost Effective HPC

I know… you’re probably thinking, “what the *?” The phrases “Cost effective” and “HPC” seem as rarely seen together as are a double bill of Ferris Bueller’s Day Off and Rocky Horror Picture Show. But, in fact, with rapidly expanding efficiency and performance capability of supercomputing systems, electricity costs in large scale machines may warrant deeper scrutiny of the need for newer hardware. What got me thinking about this was a startling realization about systems near the lower left “Corner of Inefficiency” of the familiar Exascalar plot below.

The systems near that lower left corner are almost a factor of one hundred less efficient than the most efficient systems of comparable performance. In other words they consume about one hundred times the energy for comparable work. This can be a big deal, for instance a 20kW system or a 2.0 MW system. If you think about the cost of electricity, there could be some real ROI there.

So the question is how to visualize that difference in cost. The point of what I discuss below is not to provide an accurate cost analysis for every application, but to show how this general framework can be put to use.

Costs of supercomputers, especially those at the forefront of innovation, are difficult to estimate. For the purposes here I chose to use a published cost of the Lawrence Livermore Labs Sequoia computer as the anchor point for this analysis. For comparison read about the ORNL supercomputer here. Assuming a constant $/flops one can easily scale capital cost according to performance. This scaling is shown as the horizontal lines in the Figure below.

Electricity costs also vary widely from location to location.  Industrial electricity costs are actually falling in the US, but for the sake of simplicity I have assumed $0.07/kWh with an assumed facility PUE of 1.6. $0.07 is about the average industrial electricity rate in the US.  This translates, conveniently, to a total energy cost of about $1/(Watt*Year). You can see system-level annualized energy costs in the Figure.

From this point it is pretty straight forward to calculate a payback time for replacing inefficient servers. It’s interesting they work out to be vertical lines. It’s interesting that they times for return on investment show up as vertical lines. It’s astounding that they are so short. In several cases, less than a year!

Again, this is not intended to be a definitive analysis of return on investment or total cost of supercomputer ownership. But I think this initial estimate is provocative enough to warrant further investigation. To me it looks like millions are on the table.

So, what are you waiting for?