Microsecond Responsiveness with Exchange and Intel SSD

Introduction

Microsoft Exchange* utilizes its own database engine called Jet, and has its own benchmark tool called JetStress 2013, which has been out since the latter half of 2013. When Intel received this tool we decided to try it against one of our server reference platforms to see if Intel Data Center SSD's would achieve microsecond performance characteristics and therefore provide exceptional user response to mail submission and other user experience facets of the Exchange Architecture. Storage is just one facet of the overall user experience, one has to factor in network latency, and the service time between critical services such as an Active Directory Server and Exchange, however each piece needs careful analysis. In this blog we show the potential of using direct attached Intel SSD's with JetStress 2013 as the driver for key storage characteristics such as:

  • Exchange Databases Reads and Writes (which use lazy writing)
  • DB Log write latency (used in the replication processes of Exchange).

The writing of the databases is critical to the overall health of the solution and certain key functions such as mail submissions.  SSD's will be most beneficial to your architecture in supplying better latency to online functions. Specifically, the log write latency is key to the cross-machine function of providing online backup copies. A typical Hard Disk Drive (HDD) will respond to an I/O request in 3 to 20 milliseconds.  The latency requirements of JetStress are:

  • 20 milliseconds for database I/O
  • 10 milliseconds for the log data

In this report you can see how the Intel SSD performed against these requirements and the results are impressive.

Testing Configuration

It's important to test a pertinent Exchange configuration. Our goal for an Exchange campus or geographically central architecture that would employ 3 servers each server supporting up to 5,000 mailboxes, depending on the size of those mailboxes so with 3 servers we could support a fairly large campus or corporation of 15,000 - 30,000 (or higher) active mailboxes, depending on things such as mailbox size, activity of the user base, and how many machines we added to the Exchange DAG cluster. From the storage aspect, we tested only 1 LUN (storage partition), as our controller supported only 8 SSD's. The total usable space was 5,8 TB (TeraBytes), and the RAID level was RAID 0, as back up copies are available for near immediate recovery, assuming the use of Drive Availability Groups (DAG). With DAG copies you really don't need to forfeit drives for parity backup as you have other systems with your data alive and well using synchronous replication, this is supported well by SSD's as they create the log records in microseconds as you see in this blog. With the configuration we tested below we provide good support for maintenance, because of 1,000 mailboxes per DB, so in case of corruption you have a partition set of affected users, assuming your Exchange environment were to have an issue with one specific Database. Supporting up to 10,000 mailboxes and more users and mailboxes per machine is also a facet that should be tested but we don't test this directly in this blog, your  next logical step is to use LoadGen for Exchange 2013 to test against a profile most representative of your user base. This is extremely important as in most configurations you need to leave CPU and Memory headroom for server fail over.. It may be 10,000 mailboxes based on your machine power, it may be less, and this requires further testing. Our configuration for 5,000 mailboxes is outlined below:

JetStress configuration

Number of Exchange mailboxes simulated

5,000

(1GB mailbox size)

Number of active mailboxes per server

5,000

Number of databases per host

5

Number of copies per database

4 (1 primary / 3 secondary)

Number of active mailboxes per database

1000

Simulated profile: I/O operations per second per mailbox (IOPS, include 15% headroom)

.06 iops

Database and Log LUN size

5.8 TB

Total database size for performance testing

4.8 TB

% formatted storage capacity used by Jetstress

79%

The capacity is the only issue here so you may have to create a secondary LUN on attached storage system and additional controller, and move mailboxes across LUN's, there are expansion cabinets that support up to 24 small form factor drives slots per chassis., To relieve capacity concerns at Intel, the IT group by default sets its mailbox retention policy to 60 days for the average user this creates average mailbox sizes around 800MB or under 1GB per user. This makes for a cost effective option, since some people choose to store their data locally, other teams may choose to store data longer term on the server. There are many solutions and options you can employ architecturally with Exchange, and those options are not the focus of this blog, here we want to focus on what SSD can do for the Storage performance and the concept of using DAG and high availability in your Exchange Architecture as a means for high performance for user activity as well as a strong choice for high availability.

Hardware Configuration

System / Board C206 Platform (S2600 GZ Intel-type reference platform)
CPU Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.01GHz, 3001 Mhz, 10 Core(s), 10 Logical Processor(s)
DRAM 128 GB DRAM, 1600 Mhz, 4 channel DDR3 ECC

RAID Controller

LSI MR9265 8i - with FastPath enabled
Intel SSD Intel DC SSD S3500 x 8 in a 1 LUN configuration
Operating System Microsoft Windows 2012 Enterprise Edition
Filesystem (Blocksize) ReFS  (64kb)
Hyperthreading On
Dual Port SAS RAID Controller LSI MR9265 8i - with Fast Path enabled (dual core ASIC)
Firmware Version

3.270.65-2578

Stripe Unit Size 512 KB
Write Buffer Setting Write Through
Read Cache Policy Always Read Ahead
Memory 512 MB per SAS Port
Flash Cache 16 MB
IO Policy Direct IO
Intel SSD x 8 Data Center S3500 model

Testing Results

Here are snippets from one of many repeatable tests at the configuration you see above:

Parameters:

parameters.png

Transaction Latency: Notice the values at or below 1 millisecond, especially for Log Writes - .151 microseconds - consistency of your RAID set matters!

Transact Latency.png

System Overhead:

host.png

Observations:

- The database count is important we found 5 to be as a logical trade-off since we had a 5TB LUN size and a goal of 1GB mailboxes, There was an increasing amount of latency although not significant as we move above 5 databases. We tested up to 20 databases and found all of those numbers acceptable, but not as efficient as you see here.

- The new auto-thread feature of JetStress seemed to give us the best results at a very low CPU overhead for the storage usage, since we are in a low IOPS situation we expect low CPU overhead for the storage. 1% CPU overhead is amazingly low Storage overhead for JET and is a testament to the maturity of this product and storage subsystem.

- We could not really scale JetStress itself beyond a certain number of cpus.  For us it could scale on 10 cpus (virtual processors). Of course this is only workload related to the storage engine of Exchange.

- The write-read mix for the test is 65/35. The database files of JET use a lazy commit process, the synchronous portion of the workload is the log writing which achieved a latency in a range of 100-200 microseconds across a large variety of configuration changes.

High Availability Considerations

We also tested the Exchange LUN in RAID 5 configuration and found the parity RAID increased the latency of the Exchange Log writing in the range of 700-800 microseconds. This result is still far above  the expected service level threshold of 10ms for Log writing. Running your DAG volumes on RAID 5 or RAID 0 is your choice, but RAID 5 will only protect you from a drive loss and not from application or service loss which is your goal, DAG can also be used to help in an active disaster recovery scenario as you can see in Reference 2 below, where DAG is described in detail and gets used across global regions.

Summary

Microsecond performance provides excellent user experience and resiliency of Database copies, BDM (background database maintenance) and replication which are all part of storage mechanics for Exchange. System and application level high availability needs to be a major design objective for critical business applications. Most hardware components have very low failure rates today well under 1% but no hardware is fail proof and you need to protect yourself in a complete system level manner. For system failures, the recovery time should be measured between your DAG systems that are supporting each other. These are important reasons to pursue the lowest latency of modern storage that can provide I/O latency times near or below 1 millisecond.

Direct attached storage provides lots of latency benefits especially when SSD is used, as a SAN solution for Exchange cannot provide below 1 millisecond performance for the Database  unless it too is Flash enhanced (which is common today). Today's trade-offs also include running Email and Productivity tools as a Cloud-based (SaaS) solution which will be more affected by network latency than a campus co-located solution, but latency is not the only reason one would choose an on-premise approach. Other reasons to consider on premise solutions might be existing architecture, shared resources, or security standards. Some organizations may wish to combine Exchange onto a Cloud architecture for the tiering of data or snapshots. Lower cost storage storage might be effective for large mailbox needs per user that are 100% server stored.  Your architecture should depend on your business goals, storage tiering, response time, access and security needs and then produce a total cost solution, that also provides the most mailbox support per machine with CPU and Memory headroom for a DAG fail-over.

One final note on storage cost, the very common $/GB. The cost differential between mainline hard drive storage and SSD on a per unit basis may not be as divergent as you might think. A 10K Hard Disk Drive runs about 50 cents/GB at street price and a mainstream Data Center SSD runs about $1.00/GB, (quoted March 5, 2014 pricing, at "street vendor" http://newegg.com). You can see that the latency benefits are far better than 200% (2x), so you trade-off of performance to cost is a good one, but if an SSD solution is too rich for your budget, a possible solution is to Cache the HDD based solution with just 1 SSD per machine. You could use an Intel PCIe based SSD as a simple Cache device and run a Caching Accelerator to achieve better online performance, all for the cost of just a single (1) SSD per machine. PCIe SSD's will achieve even lower latency characteristics than SAS or SATA SSD's, especially since they can run without a storage controller and provide direct block level I/O to the processor. To see your latency, you can track PerfMon objects,  Disk seconds/write or Disk seconds/read. A software caching option is, Intel Cache Acceleration Software (CAS) which provides a file caching solution, where you can pin the log and database files of Exchange exclusively to the SSD where it is managed where a working set of the file blocks will be managed by the Cache Accelerator, which is essentially a file system agnostic filter driver. Please read the following paper to gain more insight and testing results into SSD Cache Acceleration for Exchange as opposed to a complete SSD storage system for Exchange if your mailbox requirements are for very large storage requirements per mailbox. This would be especially beneficial if your average mailbox size grows above 1GB per mailbox and you wish to keep the storage cost per mailbox as low as possible. Front-ending near line, or lower cost 7200 RPM HDD drives with Intel CAS is also a good possibility since CAS will do the heavy lifting for you to keep the performance high even on a lower cost high capacity storage solution  operating well below 5 ms for Logs, and 20 ms for the Exchange Databases. Microsoft itself uses DAS (direct attached storage) with Drive Availability Groups, you should consider the same for your Solution needs (see references 3 below to see it in Microsoft's own words)

* Exchange is a service mark of Microsoft Corporation

References

1

What's new in Exchange 2013

What's New in Exchange 2013: Exchange 2013 Help
2 Using Drive Availability Groups

Drive Availability Groups

3 The Top 10 Storage Myths around Microsoft Exchange Blogs - Exchange Team Blog - Site Home - TechNet Blogs
4 Jet Stress 2013 Download Microsoft Exchange Server Jetstress 2013 Tool from Official Microsoft Download Center
5 Exchange Storage Review Program Exchange Solution Reviewed Program (ESRP) – Storage
6 Exchange JetStress 2013 Field Guide Announcing the Jetstress 2013 Field Guide - Exchange Team Blog - Site Home - TechNet Blogs

Frank Ober

Read more of Frank's SSD related posts

Published on Categories Archive

About Frank Ober

Frank Ober is a Data Center Solutions Architect in the Non-Volatile Memory Group of Intel. He joined 3 years back to delve into use cases for the emerging memory hierarchy after a 25 year Enterprise Applications IT career, spanning, SAP, Oracle, Cloud Manageability and other domains. He regularly tests and benchmarks Intel SSDs against application and database workloads, and is responsible for many technology proof point partnerships with Intel software vendors.