Introduction to SSD Caching

SSD Caching 101

This is part 1 of a multi-part series on SSD caching.  In this post I will introduce important SSD caching concepts and terminology.

Server-side SSD caching involves keeping a copy of frequently accessed data locally on a server in order to accelerate subsequent access to this data.  Most workloads naturally access certain subsets of the data more frequently, so this actually works out quite well.

Cache Architecture.jpg

Hit or Miss

If the cache contains a copy of the data we request, this is called a “cache hit” and if it doesn’t, this is called a “cache miss”. Quite simply, we want to have lots of cache hits, otherwise what’s the point of introducing caching?

HitOrMiss.jpg

Locality of Reference

The fundamental reason that caching works is that workloads often have predictable patterns and we can often use these patterns to help us guess what data we should place in the cache. If we are good at guessing, we can dramatically improve the application performance.  Here are a few common patterns:

Temporal Locality

Sometimes when you access a piece of data, you are likely to access this same data again in the near future. This property is called “Temporal Locality”.  Storing the most recently accessed data may result in a cache hit later.

Temporal Locality.jpg

Spatial Locality

Other times, if you access a particular set of data, you are likely to access neighboring data in the near future.  This property is called “Spatial Locality”.  Reading the next data block may result in a cache hit later.

Spatial Locality.jpg

Cache Associativity

A cache is fast, but small. Space is limited, so we must choose how to map each data block to a cache block.

Direct Mapped


Direct Mapped.jpg

The most straightforward mapping approach is called “direct mapping”, in which a data block can be mapped to only a single cache block. It is a simple design; however, let’s consider the direct mapping scenario where 2 data blocks are fighting over the same cache block.

With direct mapping, it doesn’t matter whether the rest of the cache is empty or not, these data blocks are not permitted to be cached in any other cache block, so one must evict the other.  This is particularly absurd if both data blocks are frequently accessed.

Set Associative


N Way Associative.jpg

If we allow a data block to be cached in 2 cache blocks, this is referred to as a “set-associative cache” or may also be referred to as a “2-way associative cache”.  Now we can choose which of 2 existing cache entries if we’d like to evict older data to make room for a new cache entry.

As we increase the cache associativity from 2-way to 4-way, there is often an improvement in the effectiveness of the cache, so why not implement an 8-way, 16-way, or 128-way associative cache?  Adding associativity increases overhead and often results in diminishing returns1.

Fully Associative


Fully Associative.jpg

Why not enable any data block to go in any cache block?  This concept is known as a “fully associative cache”.  A fully associative cache design has the potential to dramatically reduce the miss rate and thus improve performance, when compared with a more common 4-way associative cache2, but it does require extra overhead.

Types of Cache Misses

Since the main goal of caching is to get the greatest possible number of hits, it is important to examine what would cause us to miss the cache.  Known as “The 3 C’s”, the primary reasons for cache misses are Compulsory, Capacity, and Conflict misses.

Compulsory

A compulsory miss is the simplest case in which either the cache is empty or we’ve never read the data before.  This is often referred to as the “cold start miss”, since an empty cache is colloquially known as a cold cache.  Once the cache is warmed up, compulsory misses all but disappear.

Capacity

A capacity miss just means that the data you are looking for was evicted to make room for something else. This is the most common miss and will be the focus of the next few sections.

Conflict

A conflict miss means that the data you were looking for was evicted, due to an associativity conflict. One can have conflict misses, even in a nearly empty cache.

Caching Policies

There are a few caching policies to choose from depending on your use case and caching needs.

Write-through cache

Write-through caching keeps a copy of frequently accessed data in SSD cache, but ensures writes are synchronized with the backend data storage.  This type of caching is very beneficial for read-intensive workloads, but doesn’t accelerate anything in a write-intensive workload.

Write-back cache

In order to accelerate writes, one can implement write-back caching.  This type of caching will accelerate both reads and writes.  Writes may eventually be synchronized with the data store, but there may be no guarantee on when.

Write-around cache

This policy only caches data that has been previously read.  Writes do not populate the cache.  This focuses the cache on just data we know we have read before.

Cache Eviction Policies

The thing about caches is, eventually they fill up.  When that happens, we’ll need to start replacing old cache blocks with new ones. Deciding which cache blocks to remove is defined by the eviction policy.  There are a variety of popular eviction policies and they trade off additional performance for system overhead.  Here are some popular eviction policies:

Random

The most simple cache eviction policy is random eviction.  The results are usually random, as you might have guessed.  Let’s not waste any more time on this one.

FIFO – First-in, First-out

In First-In, First-Out (FIFO) eviction, the first cache entry in the cache is the first one evicted. This doesn’t consider how recently or often data is accessed, so there is room for improvement.

LRU – Least Recently Used

Least Recently Used (LRU) eviction removes the data that has been least recently used and leaves all of the recently accessed data in the cache.  This is one of the simplest caching algorithms designed to benefit from temporal locality.  It has a little more overhead than FIFO, since we must keep track of what is most recently used and least recently used, which is often done in a linked list.  It often pays off in a higher cache hit ratio in various workloads3.

There are some workloads that don’t do well with this design, such as a workload that alternates between certain data sets.  In these antagonistic workloads, the LRU cache, may excessively evict and repopulate the cache.

LFU – Least Frequently Used

Least Frequently Used (LFU) eviction policy evicts the least frequently accessed blocks, often by keeping leaderboards.  This often improves over LRU, especially if the workload may vary or has bursts of random activity that may cause excessive eviction in the LRU policy.

Final Thoughts

SSD caching can be seen as a great way to try SSDs in a datacenter workload for the first time.  I would be remiss if I didn’t recommend Intel® Solid-State Drives and Intel® Cache Acceleration software, of course.

References

1Hill, M.D. & Smith, A.J.  “Evaluating associativity in CPU caches”.  Computers, IEEE Transactions on (Volume:38, Issue: 12)

2Hallnor, E.G. & Reinhardt, S.K.  “A fully associative software-managed cache design”

3 Gomaa, H., et al.  "Estimating Instantaneous Cache Hit Ratio Using Markov Chain Analysis" IEEE/ACM Transactions on Networking Vol. 21, No. 5, 2013.

Benjamin Donie