Scale out storage – Thinking Outside of the Box

At the base of cloud infrastructure is virtualization..

With a physical approach, most of the challenges you may face include under-utilization of server resources, difficulty in protecting server availability, and dealing with disaster recovery. All of these problems are made easier with virtualization. However, due to the complexities associated with hypervisor management resources and the shared storage model, the largest challenge comes from storage management.

In a cloud environment, usually there are two approaches to design the storage solution: scale-up and scale-out, and how the adoption of each strategy will affect the overall cost, performance, availability, and scalability of the entire cloud solution.

Beside the fact that topology decision is a combination of functionalities, price, TCO and skill, the biggest differences between scale-out and scale–up topology is shown below:

Scale-out (SAN/NAS)

Scale-up (DAS/SAN/NAS)

Hardware scaling

Add commodity devices

Add faster, larger   devices

Hardware limits

Scale beyond device limits

Scale up to device limit

Availability, resiliency

Usually more

Usually less

Storage management complexity

More resources do manage, software   required

Less resources do manage

Span multiple geographic locations



Usually, scaling up an existing system often results in simpler storage management than with the scale-out approach, as the complexity of the underlying environment is reduced, or at least known. However, as you scale up a system, the performance may suffer due to increasing density of shared resources in this topology. With scale-out topology, the performance may increase due to the increased number of nodes where more CPU, memory, spindle and network interfaces are added with each node.

Storage is a key component in cloud computing. Now, there are a number of options based on the workload, as shown in the graphic below:


There isn’t a “one solution fits all” in a cloud environment. The architecture should be built to allow a separation of virtual machines from the physical layer, and a virtual storage topology that allows any virtual machine to connect to any storage in the network. These are both required for a strong cloud infrastructure.

Based on a typical private cloud environment, there are systems that require this solution due to high-speed transfers of small pages, such as databases with 8kb pages like those on OLTP business databases. The solution is also best in cases of large sequential access, such as backup and archive systems or systems with large sums of application data, like VM files and web content.

The virtual storage architecture should be connected to each node, and share the same connectivity method (e.g. TCP/IP) to support the cloud infrastructure. An example of this is shown in the image below.


There are several ways to provide this connectivity, for example, iSCSI, NFS or even FCoE. These allow connectivity with legacy SAN based storage. In this environment, you can use tiers of storage, such as a tier of high-performance disks such as SSDs for faster IOPS, or disks with better GB/$ capacity, such as 10k rpm. While you are able to define a “balanced” tier, it is important that the scale-out storage management software allocates the data on most appropriated tier, as needed.

A common question about cloud computing infrastructure is should I use DAS? Booting from storage can be an option. For a virtualized environment where the hypervisor image is stored in the network and streamed to server during the boot process, it can narrow down the server costs and MTBF associated with regular local hard disks. However, in a big environment, the server that hosts the hypervisor image should be designed to support a boot storm from several machines to thousands of machines at the same time.

In a scale-out storage architecture where booting from the network is a requirement, a set of special NICs that support the iSCSI boot should be considered. Usually, it is 25% more expensive than a regular 10GbE NIC. However, if you would like to mitigate risks with a possible boot storm and at the same time improve reliability of server platform, adoption of local SSD for hypervisor can provide a higher MTBF and improve the MTTR.

Best Regards!

-Bruno Domingues