Considerations when tuning your Intel Xeon Processor 5500 series based server?

This blog post is meant to discuss some of the considerations for performance tuning your Intel® Xeon® Processor 5500 (“Nehalem-EP”) series based server. I’d like to do this by discussing the un-boxing process.

Step 1. Place the box on the floor

Step 2. Open the box

Step 3. Carefully remove the server.

Step 4. Plug the server into a keyboard, mouse, and monitor.

Step 5. Plug the server into the wall socket.

Step 6. Power on the server.

There. You are done tuning your Nehalem-EP based server for performance. “Really?” you ask? Well mostly. There are some considerations and I’ll discuss them. I can speak to this subject as I was asked to tune this class of system using the TPC-Cand TPC-Ebenchmarks.

BIOS / Firmware / Drivers

It is very important to remember to update your system's BIOS, firmware, and OS drivers before you do any deep performance tuning. I cannot over state the importance of this step. Your system's manufacturer should be able to provide the latest BIOS and firmware associated with your server. OS drivers are available through many sources these days. Typically these can be downloaded from OS vendors, hardware vendors, from the Linux open source community, or the platform's manufacturer.

A good example of this is the SATA driver associated with the ICH10. The ICH10 is part of the chipset that supports Nehalem-EP. I recommend going to Intel’s website and using the Intel Matrix Storage Manager driver for the SATA controller.

Understand your system

Last year, Nehalem launched for the desktop market segment. Now it is time for the server market. The Nehalem-EP processor is meant to be used in dual processor (DP) socket systems. Nehalem-EP is the follow on to the Intel Xeon Processor 5400 (“Harpertown”) series. However, Nehalem-EP is really very different from Harpertown. The Nehalem-EP processor is based on the Intel Core i7. The Nehalem-EP processor inherits the same architectural features as the Intel Core i7. Once you understand these features, then you can better tune your system for performance.

L3 Cache:

Nehalem-EP uses a level 3 cache. Depending on which SKU you are using it can be 4MB or 8MB in size. If you are interested in performance, then I would encourage you to pick the larger cache size SKU.

Hyper Threading Technology:

If some threads is good then more threads is better. This is where Hyper threading technologycomes in to play. Nehalem-EP provides this technology out of the box. So, on a typical DP server this will give your system 16 threads of processing goodness.

Intel Quick Path Interconnect:

Nehalem-EP supports a CPU interconnect known as Intel Quick Path Interconnect (QPI). This interconnect is the replacement for the Front Side Bus of old. QPI provides a point to point link to each of the processors and the Intel X58 chipset. The Nehalem-EP supports QPI speeds of up to 6.4GT/s. This provides a theoretical bandwidth of 25.6 GB/s. This is a welcome shift for Intel’s designs for the future.

Turbo Boost Technology:

As with the desktop SKUs of Nehalem, the Nehalem-EP supports Turbo Boost technology. This technology will run the CPU at a higher frequency than its rating. It will increase the frequency in steps of 133MHz until it achieves its upper thermal and electrical design requirements. Turbo Boost Technology is dynamic. In other words, the processor will decrease its core frequency if the temperature is too high. If your application is sensitive to core frequency changes and does not fully utilize all cores, then it may benefit from this technology.

Integrated Memory Controller:

Another key feature of Nehalem based processors is that they have the memory controller integrated into the processor. This allows for much lower memory latencies. The Nehalem-EP supports three channels of DDR3 memory. It is important to talk about DDR3 memory and population on Nehalem-EP based servers. As mentioned before Nehalem-EP supports three channels of memory and supports 800, 1067, and 1333 MT/s memory speeds. Those speeds are dependent on how many channels are populated with DIMMs. For instance, 1333 MT/s is supported in a single DIMM per channel configuration. 1067MT/s is supported in a single DIMM per channel and two DIMMs per channel configuration. 800MT/s is supported in all configurations. These speeds are based on dual ranked DIMMs. If you plan on filling up all the memory slots with as many DIMMS as possible you will end up running at 800MT/s. So, here is the consideration. Does your application need all that memory or could it use less memory running at a higher speed? If the answer is yes to the latter, then perhaps running two DIMMS per channel at 1067 MT/s is the best configuration.

To wrap things up here, we have looked at the new and Nehalem architecture, the importance of BIOS/ firmware/ OS drivers, and memory population. Your application's performance will vary, but I hope I have given you some things to narrow down your performance testing. Thanks for taking the time to read this blog post. For more great performance methodology tips please check out Shannon Cepeda’s blogposts on performance tuning.