The following are some considerations prior to tuning your MP Xeon 7400 series server. I can speak to this subject as I was asked to tune this system using the TPC-C and TPC-E benchmarks for internal measurements at Intel. While you may not be setting up thousands of hard disk spindles for your performance work, this blog post attempts to capture some of the key tuning considerations of this Xeon-based server.
Understand your system
The key to tuning any system, whether it is a formula one race car (I promise to stay away from silly car performance analogies) or a server is to understand it. Identify what components have an effect on performance and what components don't. This will narrow down your tuning efforts.
Like all of Intel's platforms, an MP Xeon 7400 series server is made of several ingredients. Of course since I work for Intel I need to start out the ingredient list with the central processor. Our website has a good description of this processor here The MP Xeon 7400 processor is made up of three Core 2 Duo T5000/T7000 series processors. This provides six (yes six) cores for processing goodness. Each of the Core 2 Duo T5000/T7000 series processors provide 2 32KB level 1 caches (1 for data and 1 for code) and a 3MB level 2 unified cache. In addition to these two levels of cache the MP Xeon 7400 processor provides a 16MB level 3 unified cache. The other major ingredient to this platform is the IntelĀ® 7300 Chipset. This chipset provides four independent front side bus links to the four CPU sockets. In addition, this chipset provides a snoop filter and four channels of FBD memory.
If some is good, then more is better:
The key thing to take away here is that an MP Xeon 7400 system fully populated with top bin processors will provide a whopping 24 cores of processing power in a four socket system. This is great for the enterprise benchmarks I use for performance testing as those applications are multithreaded and designed for multi-core processors. The same may not be true for your application, so please keep that in mind.
Another thing to remember is that an MP Xeon 7400 processor's design follows a growing pattern in the Xeon processor family. Specifically, I am referring to the addition of the level 3 cache (L3). This is also known as the last level cache (LLC). This follows the design of the Potomac (Xeon MP 64-bit) and Tulsa (7100-series) processors. The value of the large LLC is that it reduces the number of cache misses that would require the machine to go to FBD memory for the latest copy of a cache line. This additional level of on-chip cache comes at a price, though: higher latency. While the latency penalty is relatively low when compared to the latency to memory it is important to mention it here. Again, the LLC greatly benefits enterprise benchmarks I use for performance testing as they have a large memory footprint. The same may not be true for your application.
BIOS / Firmware / Drivers
It is very important to remember to update your system's BIOS, firmware, and OS drivers before you do any deep performance tuning. I can not over state the importance of this step. Your system's manufacturer should be able to provide the latest BIOS and firmware associated with your server. OS drivers are available through many sources these days. Typically these can be downloaded from OS vendors, hardware vendors, from the Linux open source community, or the platform's manufacturer.
Intel processors have traditionally provided four prefetchers. These are accessible via model specific register IA32_MISC_ENABLE and sometimes via your OEMs BIOS. These features are meant to help the processor load data in a predictive manner to keep the cache hierarchy filled with the most pertinent cache lines. This is great if the application uses data in a somewhat predictable way. If your application uses cache lines in a random fashion, then the prefetchers may negatively impact performance. My best advice for you is to test your application with the prefetchers enabled and disabled. Table B-3 (MSR 0x1A0) in this link covers the prefetchers I am referring to.
As mentioned before, an MP Xeon 7400 series server will provide four channels of FBD memory. There are a couple of considerations here. First, latency to memory increases for every DIMM added to the system. This is important to note because you can keep the memory latency to a minimum by adding fewer high capacity DIMMs. Second, be sure to evenly distribute the DIMMs across all the channels. In other words, don't fill up all the slots on one channel and then lightly populate the rest.
An External Factor that may affect performance
Like many Intel designs, an MP Xeon 7400 series server will choose dishonor over death. I am referring to how it deals with high temperatures. The FBD memory inside an MP Xeon 7400 series server makes use of a thermal monitor on each DIMM. If the memory becomes too hot the chipset will begin to throttle memory bandwidth in an effort to reduce the temperature of the system. This will have a drastic negative impact to performance. So, keep your server room nice and cool.
To wrap things up here, we have looked at the architecture, the importance of BIOS/ firmware/ OS drivers, the prefetchers, memory population, and the effects of high temperatures. Your application's performance will vary, but I hope I have given you some things to narrow down your testing. So, by now you might be asking. "Where do I start?" Well not to be too self serving, but I would check out more of our blog posts here. A great place to start for performance methodologies would be Shannon Cepeda's blog. This series is a great resource for anyone interested in computer performance methodologies.