Nehalem and XenServer Raise the Bar for XenApp Performance

This past April, as Intel was releasing their new Xeon E5500 series processors, we showed you some remarkable test results demonstrating a solid 53% performance improvement between E5400 and E5500 based servers when running a DBHammer SQL Server 2008 workload We now wanted to move onto a workload that represents the largest segment of the Citrix user community, XenApp. More specifically, XenApp 5.0 virtualized with the new XenServer 5.5. As we've seen in previous similar virtualization performance tests with XenApp on XenServer, when the XenApp guests are 32 bit (the majority of XenApp users still use use 32-bit applications), the opportunity for server consolidation can be significant. We wanted to see just how good the server consolidation opportunity is when an Intel Xeon E5500-based server is used as a XenServer host. In this case, we looked at how the server consolidation might look when going from 2.93 GHz Xeon X7350 physical XenApp servers to 2.93 GHz Xeon E5570 XenServer hosts.

For the purpose of this test, we ran the physical XenApp server with a single 32-bit workload (Windows 2003 SP2 with MS Office). It was given 2 CPUs and 4GB RAM, typical for this XenApp server workload. Using EdgeSight for Load Test (ESLT) version 3.5 we established a baseline of 25 seconds for users to login, run a standard MSOffice task script, and then logout (including network connect time). We added users until the threshold to run this sequence reached a latency of 30%, at which point the server was deemed to be at capacity. Using this configuration and test program, the maximum number of users was 47. This was a relatively small, single physical XenApp server, so 47 concurrent users was considered respectable.

Since we were testing with a Xeon E5570 server with dual quad core CPUs and 32 GB of RAM, we wanted to see how many users we could get onto a single host using multiple XenApp VMs, each with the same resource configuration as we used in the physical server test. We built 2 vCPU, 3.5 GB RAM XenApp virtual servers on the E5570 and ran two tests using the same ESLT workload. The difference between the 4 GB of RAM used in the physical server test and 3.5 in the virtual server test is due to the need for memory overhead when running multiple VMs. In the XenServer setup screen, we selected the option of running XenApp which automatically configured the VMs with the appropriate amount of shadow memory for XenApp workloads.

We also wanted to see the impact of hyperthreading to VM density per host as well as the number of concurrent users per VM. Intel describes hyperthreading as “delivering thread-level parallelism on each processor resulting in more efficient use of processor resources, higher processing throughput and improved performance.” It would be interesting to see how many more concurrent XenApp users we could get with an upgrade to the E5570 and by virtualizing with XenServer 5.5 and then see how many more users we might get once hyperthreading was enabled. Would hyperthreading allow us to run twice as many VMs on a single host? To find out, we ran our first virtualized XenApp test with hyperthreading activated and then repeated the test again with it turned off. With hyperthreading, the first thing we noticed was that even though there were only 8 CPU cores on the E5570 host server, XenServer was able to see 16 vCPU cores as resources available to be assigned to VMs. As a result, we were able to successfully run a maximum of eight VMs, each with the necessary 2 vCPU cores, generating an average of 69.25 users per VM for a total of 554 users.

When we ran the second test, this time with hyperthreading turned off, and noticed that the number of users per VM increased slightly to 88. However, the maximum number of VMs was now only four, due to the fact that we now only had 8 vCPU cores to work with. As a result, the total number of users for the host was now only 352.

multi vm test (640x337).jpg

Single VM test (640x359).jpg

In the end, we discovered that while hyperthreading doubled the number of assignable vCPU resources, it didn’t directly translate to a 2:1 increase in the number of users per VM. That’s a reasonable trade-off, since hyperthreading effectively doubled the number of VMs that we could create with the same number of CPU cores. So, while were able to generate 6.5x the number of concurrent XenApp users onto a single Xeon E5570 host server without hyperthreading as compared to a single X7350 physical XenApp test server, the number of concurrent users increased to an incredible 10.8x with hyperthreading. That’s a remarkable server consolidation opportunity for any 32-bit XenApp administrator. And while XenApp will virtualize very nicely with XenServer on that same dual quad core X7350 server, remember that the number of users per VM when using this test schema will be 47. Since hyperthreading isn’t available on the X7350, the maximum number of VMs on the X7350 host would be 4 making the maximum number of concurrent users 188. Not bad, but nowhere near the 544 concurrent users we get on the E5570 with hyperthreading. That’s an increase of 356 users, almost three times the number of concurrent XenApp users.

Pretty hard to ignore.

As we’ve seen here, the promise of Intel’s Nehalem technology is being realized in some very practical ways. As a result, the performance bar for XenApp, when virtualized with XenServer, is now higher than ever.