Data Mining the Green 500 and Top 500: Fun with Exascalar

I published a blog late last year on an idea bringing insight from the Green500 and Top500 together in a way that helps to better visualize the changing landscape of supercomputing leadership in the context of efficient performance. Since then I have started to refer to that analysis by the shorthand term “Exascalar.”

Recall, Exascalar is a logarithmic scale for supercomputing which looks at performance and efficiency mormalized to Intel’s Exascale Goal of delivering one Exaflops in a power envelope of 20 MegaWatts.

Of the emails I received on the topic and one of the most interesting was from Barry Rountree at LLNL. Barry has done a similar analysis looking at the time evolution of the Green500 data. So I thought, “why the heck not for Exascalar?”

And then I had some fun.

Building from Barry’s idea, I plotted the data for the Green500 and Top500 from November 2007 to November 2011 in one year increments (with the addition of the June 2011 data for resolution) as an animated .gif file shown below. The dark grey line is the trend of the "Exascalar-median." To highlight innovation, in each successive graph the new points are shown in red while older systems are in blue. The unconventional looking grid lines are constant power and exascalar lines.


Please click image to animate

There’s a lot going on here. One notices some obvious “power pushes” where occasionally a system pushes right up against 20 MW line to achieve very high performance. Invariably these achievements are eclipsed by systems with higher efficiency.

Another thing that’s striking is the huge range off efficiencies between systems; over a factor of one hundred for some contemporary systems with similar performance. That’s pretty astounding when you think about it - a factor of one hundred in energy cost for the same work output.

But the macroscopic picture revealed, of course, is that the overall (and inevitable) trend shows the scaling of performance with efficiency.

So how is the trend to Exascale going? Well one way to understand that is to plot the data as a time series. The graph below shows the Exascalar ranking of Top, the Top 10, and the Median systems over time. Superimposed is the extrapolation of a linear fit which shows why such a huge breakthrough in efficiency is needed to meet the Exascale goal by 2018.

Exa Trand.gif

It’s remarkable that Top10 and Top Exascalar trends have essentially the same slopes (differing by about 7%) , whereas the slope of the Median trend is about 20% lower.

But these simplified trends belie more complexity “under the covers.”  To look at this I plotted the Top 10 Exascalar points from 2007 and 2011 and then superimposed trendlines from the data of intervening years. Whereas the trend line of the “Top” system has really trended mostly up in power while zigging and zagging in efficiency, the trend of the “Top10” (computed as an average) is initially mostly dependent on power, but then bends to follow an efficiency trend. Note that the data points are plotted with a finite opacity to give an sense of "density." (Can you tell I'm a fan of "ET"?)

Exascalar Trend Analysis Mathematica.gif

This is another manifestation  of the “inflection point” I wrote about in my last blog, where more innovation in efficiency will drive higher performance as time goes forward, whether in emerging Sandybridge, MIC, or other systems which have focused on high efficiency to achieve high performance. This anlaysis highlights what I think is the biggest trend in supercomputing, efficiency, while capturing the important and desired outcome, which is high performance. As my colleague and friend here at Intel, John Hengeveld writes: “Work on Efficiency is really work on efficient performance.”

What are your thoughts? Weather-report or an analysis that provides some insight?

Feel free to comment or contact me on @WinstonOnEnergy