Let’s say you have been tasked with architecting a Cassandra platform. One component of the platform is (naturally) storage. During the course of designing your platform, one of life’s axioms will likely rear its unwelcome head: The Project Triangle and the “Pick any two” values from the following vertices: Fast, Good, and Cheap. While that situation is normally associated with project management, I’ve found that it is also true when it comes to provisioning IT infrastructure. Simply associate fast with performance, good with capacity, and cheap with cost, and you have some nice parallels.
In one vertex, you have cost (can it be cheap?). At one end of the spectrum, you have HDDs, and at the other end, you have an all flash solution. Splitting the difference would present a solution using a mix of HDDs and SSDs.
It follows that, these storage characteristics, directly impact each other if one is optimized to a higher degree than the others (with optimizing all three not possible):
- Prioritize on cost, and you’ll likely get the needed capacity, but will the solution perform?
- Prioritize on performance, and run the risk of too little capacity at a higher cost
- Prioritize on capacity, but can costs be kept low with performance being adequate?
The purpose of this particular blog isn’t really to debate the merits of SSDs vs HDDs – when it comes to Cassandra, Al Tobey has written an excellent tuning guide that covers this. The question I want to pose, assuming SSDs will be part of your Cassandra implementation, is this: Does the choice of SSD matter?
I like to believe that it does- SSDs vary widely on their cost, performance, capacity, and also what I will put under the umbrella of robustness: durability (operating in a wide variety of conditions), consistent performance (does the drive perform the same out of the box as when near EOL? Is performance the same when the drive is empty as when full?), endurance (will the drive media last under the workload it will be subjected to?), and data integrity (has the drive been to shown to back up claims about preventing soft errors?).
There are several variables in the mix here- but what if you’re business model hinged up on making a sound architectural decision on this very topic? Well, Network Redux has introduced a new service called Seastar, which offers managed Cassandra hosting. You can read about Seastar’s own process for understanding Cassandra’s storage requirements and their conclusions here. The TLDR version if you’re in a rush: they selected the Intel® SSD Data Center S3710 Series stating “Our criterion was optimal price performance while weighing capacity, latency, endurance, and manufacturer reputation as important factors in our decision.”
Being an Intel employee, I’m happy with their analysis and selection. And being a human being, I will admit that with all things in life, YMMV, and I’d be interested in hearing stories about readers’ own considerations and experiences, not just with Cassandra, but your own infrastructure challenges and successes (and failures) over the years.