Server Storage Caching Considerations

Caching is storage. Here’s how Wikipedia defines caching:

“… .a cache (pron.: /ˈkæʃ/ KASH) is a component that transparently stores data so that future requests for that data can be served faster.”

With the data explosion and consumers creating content and wanting to access it immediately, caching is becoming more and more important. 

My colleague, Susan Bobholz is a Marketing Director in Intel’s Data Center Software Division and talks about the considerations for server storage caching.

Right now, one of the hottest storage topics is storage caching. It seems hardly a week goes by without some type of caching software showing up in the press.  I thought I’d spend a bit of time talking about this trend and provide some things to think about when choosing caching software.

In many datacenters, the hottest, most frequently accessed data is stored on 15K serial attached SCSI (SAS) hard drives. But those hard drives can become a bottleneck because they are mechanical devices with moving parts.  They simply can’t move fastest enough to keep up with some application demands.  One solution to resolve this is to replace all those 15K SAS hard drives with Solid State Drives (SSDs) but this can be an expensive undertaking.  The beauty of storage caching is that it protects your investment in  hard drives, because your application performance is improved without replacing all those hard drives with SSDs.

To be as simple as possible, storage caching allows the hottest, most important data to be stored in a SSD instead of hard drives, allowing that data to be accessed significantly faster.  Often caching is implemented as a server application, but sometimes it’s actually part of the firmware on a RAID HBA

So you’ve decided you want to implement storage caching.  There are several caching options out there.  Other than cost, what are some key questions to consider about when deciding which to use?

Where does your cache physically reside?

As mentioned above, some caching solutions are integrated into a RAID HBA.  This means that the Cache SSD must be attached to the RAID HBA itself and only data on hard drives connected to that RAID HBA can be accelerated.

Other caching solutions allow the Cache SSD to be anywhere inside the server itself.   This provides additional flexibility as the data being cached can be anywhere on the server - behind a RAID HBA, behind a SAS HBA or even attached to the chipset SATA ports.

In addition to being able to have the Cache SSD inside the server, some caching solutions allow the Cache SSD to be outside the server, in a SAN or NAS.  This is important in virtualized servers as this allows virtual machine migration to occur automatically.  The cache remains active while the virtual machine moves from host to host.

Consider where you want the cache SSD to connect to your server when choosing a caching solution.

Which OSes are supported?

Think about what OSes exist in your datacenter.  Windows?  Linux?  Virtualized OSes such as VMware ESX or Xen?     Think about whether it’s important to have a common caching solution from one vendor across all these environments.  Not all caching solutions support all these OSes.

Being able to choose what goes into the cache

This may sound unimportant, but imagine having an SLA with a customer that requires you to deliver the lowest latency to the data associated with that application. What if you could guarantee that specific data was always in the cache, ready to be accessed?   Several caching solutions available today offer proprietary ways to pin data into the cache.  This is becoming so important that standards bodies such as ANSI T10 are looking into ways to standardize ways to determine whether data should be kept into a cache at all times.

Read caching or write caching?  

Look at the applications you want to accelerate.  Do they mostly read data from hard drives or do they mostly write data?  Or is it a mix?  Some caching solutions are better are accelerate reads, others are better at writes.  Choose a caching solution that meets the needs of the applications you want to accelerate.

Caching algorithms aren’t all the same

We all learned about Least Recently Used caching in school.  Just as the name implies when the cache is full but new data needs to be added to the cache, the data that has been sitting in the cache the longest without being use will be evicted to make room for the new data.  This can be an effective algorithm and is very common.  But some caching solutions add intelligence to the caching algorithm and are able to decide to keep specific most popular/active data in the cache longer, protecting it from being evicted by more recent, but less popular data. This reduces the probability that important data is evicted from the cache, improving overall application performance.

So, these are just some of the areas to consider when choosing a storage caching solution.  What is important to you when you choose a caching solution?  Let me know!

Full disclosure:  Intel has its own caching solution:  Intel® Cache Acceleration Software that works with Intel® Datacenter SSDs.  We think it’s pretty cool.  A 30 day trial is available on 

Susan Bobholz is a Marketing Director in Intel’s Datacenter Software Division.  She’s been with Intel for 20 years, doing everything from software development to initiative management to product marketing, focused on storage technologies and products.  Prior to joining Intel, Susan developed software at Siemens Medical Labs and firmware for Motorola cell phones.  She graduated from the University of Wisconsin with a BS in Electrical and Computer Engineering.  She holds 3 patents.