NEHALEM-EX… Monster Chips and Big Boxes

A MONSTER CHIP IS COMING. The next generation of MP processor is targeted for production later this year, and by all accounts it is going to be a monster. Nehalem-EX is part of the Nehalem family of processors, but compared to its siblings it has the highest cores/threads count, largest shared cache, highest CPU-to-CPU bandwidth, highest I/O bandwidth, highest memory capacity, highest memory bandwidth, greatest scalability, and highest level of Reliability/Availability/Serviceability. It’s expected to bring a gargantuan, unprecedented leap in capabilities and performance--the biggest leap in all of Xeon product history.

IT’S TARGETED AT “BIG BOXES”. Big box servers are multiprocessor systems using the most capable processors and platform components. These systems are targeted at applications and usages that require the largest memory footprints, the highest amounts of single-box processing power (for workloads that don’t decompose well into lots of independent threads) and/or advanced levels of RAS. Such systems are typically the best choice for large databases, ERP apps, Business Intelligence apps, large-scale server consolidation and business-critical virtualization, mission critical applications and large scale high performance computing.

IT USES THE SAME PROCESSING TECHNOLOGY AS THE SUCCESSFUL XEON 5500, BUT MORE OF IT. Just like with Xeon 5500, the Nehalem micro-architecture brings improved single-threaded performance via IPC (Instructions per Clock) enhancements and Intel’s Hi-k 45nm manufacturing process. Greater multi-threaded performance comes via Hyper-Threading and more cores. But while the Xeon 5500 has up to 4 cores/16threads per socket, the Nehalem-EX monster doubles that to 8 cores/16 threads.

HAS A BEEFIER MEMORY AND INTERCHIP COMMUNICATION SUBSYSTEMS. Monster thread processing capabilities require monster size feeding to bring out the best performance. Nehalem-EX’s raw processing potential is made viable by a heavy duty memory subsystem and inter-chip communication system.

Nehalem-EX has 24MB of shared level 3 cache--that’s 50% more than the current Xeon 7400 and 200% more than Xeon 5500. The memory channel bandwidth was increased to 9-times that of Xeon 7400. And it’s all attached to up to 16 DIMM slots per socket (that’s 64DIMMs slots for 4 sockets)—double the current generation of Xeon 7400.

In a multi-socket system, processors need to communicate with each other in order to most efficiently coordinate their shared workload. They also need lots of I/O bandwidth. Nehalem-EX has four QuickPath Interconnects on every socket--double that of Xeon 5500. The four QPI links enable Nehalem-EX processors to be directly connected to each other in a 4 socket system. This offers significant performance advantage over a so-called ring architecture wherein some processor-to-processor communication must go through an intermediary processor. The extra QPIs also mean that there’s plenty of CPU to I/O bandwidth.

EXPECTED TO BRING THE GREATEST LEAP FORWARD IN XEON PERFORMANCE EVER. On key server performance benchmarks (e.g. SPEC_int_rate, SPEC_floating point_rate, TPC-C, etc) Xeon 5500 using Nehalem technology brought gains of over 100-200% greater than prior generation. Generational gains of this magnitude come along just about once a decade. Nehalem-EX’s generation-to-generation performance gains are expected to be substantially higher than those of Xeon 5500. We’ve already seen measured memory bandwidth of 9X vs. prior generation. That’s an early indication of the level by which new performance records will be set when this monster chip comes to market.

Related Topics:

NHM-EX Press Fact Sheet

NHM-EX May 26th Press Briefing Video – condensed version

IBM 8Socket Demo Video

NHM-EX--A New Standard