Nehalem-EX for High Performance Computing

Nehalem-EX: Big Memory for Big Science

I was at SuperComputing’09 last week in Portland, Oregon. I talked with some brilliant people, and saw some fantastic stuff.

It was good timing on my part because last week Intel also announced that it would offer a 6-core, frequency-optimized version of its Nehalem-EX product due out next year. This part is intended for use in tackling some of the types of high performance computing (HPC) workloads prominently displayed at SC’09.

Most people know that the majority of HPC workloads today are based on clusters of relatively small-memory, 2-socket systems. That is because most HPC workloads may be broken into smaller, discrete units of work that can be efficiently processed using such clusters. For these workloads the primary hardware capability selection criterion is typically a balance of both memory bandwidth and compute FLOPs (floating point operations per second).

But there are other types of HPC workloads. Specifically, those that deal with very large datasets (some as large as a terabyte) or those that have to deal with non-sequential memory access.   This means the workloads simply aren’t easily divisible--or it is inefficient to do so-- into the relatively small memory footprints used in traditional clustered 2-socket HPC solutions. Examples of these types of bigger memory applications can be found in a variety of fields such as weather prediction, manufacturing structure analysis, and financial services.

The high-speed processing requirements and size of these workloads put a greater premium on system memory capacity/bandwidth than on compute FLOPs.

If the larger dataset won’t fit into available memory, and dividing up the dataset to spread across multiple nodes cannot easily be done, then data has to be moved in and out memory to hard disk.  But using hard disk drives (which are many times slower than RAM memory) can drastically impair performance. 

There are now two better alternatives to the use of hard drivers. One is SSDs and the other is having a larger memory footprint. Solid State Drives have fairly high data density vs RAM, but much faster access than hard-disk drives--albeit still markedly slower than RAM. Another solution is to simply have more capacity of the faster RAM. This last one is what the Nehalem-EX HPC part is aimed at.

Nehalem-EX is the Expandable Class of Nehalem. The Expandable Class brings all the goodness of the Nehalem architecture (Xeon 5500 product line) to the HPC market, but in the form of a “super node” that has greater:

  • Core/tread count

  • Socket scaling

  • I/O and memory capacity (up to 1 terabyte in a 4 socket system)
  • Bandwidth at capacity reliability features

  • Other features

The 6-core frequency-optimized Nehalem-EX part has also been tuned to offer the highest core frequency possible for this chip.   In creating this part, Intel is meeting the needs of the HPC community that want higher scalar performance along with the benefit of large memory capacity and bandwidth per core.

Of course the 8-core version of NHM-EX is still an option for those HPC workloads that scale well with more cores while still looking for the high memory capacity of the expandable class.

By having both 8-core and frequency optimized 6-core versions of the NHM-EX class of processors means HPC researchers have greater choice in selecting the processor best suited for their specific workloads.

After talking with some of the researchers at SC’09 last week I’m really excited to see how the Nehalem-EX “super node” will deliver the necessary compute and memory capabilities to help those researchers solve some of their biggest challenges.