Here’s a prediction for 2016: The year ahead will bring the increasing “cloudification” of enterprise storage. And so will the years that follow—because cloud storage models offer the best hope for the enterprise to deal with unbounded data growth in a cost-effective manner.
In the context of storage, cloudification refers to the disaggregation of applications from the underlying storage infrastructure. Storage arrays that previously operated as silos dedicated to particular applications are treated as a single pool of virtualized storage that can be allocated to any application, anywhere, at any time, all in a cloud-like manner. Basically, cloudification takes today’s storage silos and turns them on their sides.
There are many benefits to this new approach that pools storage resources. In lots of ways, those benefits are similar to the benefits delivered by pools of virtualized servers and virtualized networking resources. For starters, cloudification of storage enables greater IT agility and easier management, because storage resources can now be allocated and managed via a central console. This eliminates the need to coordinate the work of teams of people to configure storage systems in order to deploy or scale an application. What used to take days or weeks can now be done in minutes.
And then there are the all-important financial benefits. A cloud approach to storage can greatly increase the utilization of the underlying storage arrays. And then there are the all-important financial benefits. A cloud approach to storage can greatly increase the utilization of the storage infrastructure; deferring capital outlays and reducing operational costs.
This increased utilization becomes all the more important with ongoing data growth. The old model of continually adding storage arrays to keep pace with data growth and new data retention requirements is no longer sustainable. The costs are simply too high for all those new storage arrays and the data center floor space that they consume. We now have to do more to reclaim the value of the resources we already have in place.
Cloudification isn’t a new concept, of course. The giants of the cloud world—such as Google, Facebook, and Amazon Web Services—have taken this approach from their earliest days. It is one of their keys to delivering high-performance data services at a huge scale and a relatively low cost. What is new is the introduction of cloud storage in enterprise environments. As I noted in my blog on non-volatile memory technologies, today’s cloud service providers are, in effect, showing enterprises the path to more efficient data centers and increased IT agility.
Many vendors are stepping up to help enterprises make the move to on-premises cloud-style storage. Embodiments of the cloudification concept include Google’s GFS and its successor Colossus, Facebook’s HDFS, Microsoft’s Windows Azure Storage (WAS), Red Hat’s Ceph/Rados (and GlusterFS), Nutanix’s Distributed File System (NDFS), among many others.
The Technical View
At this point, I will walk through the architecture of a cloud storage environment, for the benefit of those who want the more technical view.
Regardless of the scale or vendor, most of the implementations share the same storage system architecture. That architecture has three main components: a name service, a two-tiered storage service, and a replicated log service. The architectural drill-down looks like this:
The “name service” is a directory of all the volume instances currently being managed. Volumes are logical data containers, each with a unique name—in other words a namespace of named-objects. A user of storage services attaches to their volume via a directory lookup that resolves the name to the actual data container.
This data container actually resides in a two-tier storage service. The frontend tier is optimized for memory. All requests submitted by end-users are handled by this tier: metadata lookups as well as servicing read requests out of cache and appending write operations to the log.
The backend tier of the storage service provides a device-based, stable store. The tier is composed of a set of device pools, each pool providing a different class of service. Simplistically, one can imagine this backend tier supporting two device pools. One pool provides high performance but has a relatively small amount of capacity. The second pool provides reduced performance but a huge amount of capacity.
Finally, it is important to tease out the frontend tier’s log facility as a distinct, 3rd component. This is because this facility key to being able to support performant write requests while satisfying data availability and durability requirements.
In the weeks ahead, I will take up additional aspects of the cloudification of storage. In the meantime, you can learn about things Intel is doing to enable this new approach to storage at intel.com/storage.