Data centers today are undergoing arguably the largest change in IT history. Driven by application mobility and seamless scaling, cloud architectures are disrupting how data centers are designed and operated. Quickly disappearing are the traditional silo application architectures and overprovisioned networks.
HPC has always required the largest affordable systems, and was the first to adopt large scale clusters. HPC clusters have not only required large compute scaling, but also massive amounts of communication and storage bandwidth required to feed the compute.
HPC clusters use flat or near flat networks to deliver large amounts of bandwidth, along with fast protocols based on RDMA to minimize communication overhead and latency. HPC clusters use distributed file systems like IBM GPFS* or Lustre* above highly available shared RAID storage to deliver the bandwidth and durability required.
Cloud architectures have many of the same requirements, largest affordable compute, flat or near flat networks, and scalable storage. The requirements are so similar that Amazon*, Microsoft*, and Google* all support deployment of large-scale, virtualized HPC clusters over their respective IaaS offerings.
The storage platforms used in the deployments consists of a virtualized distributed file system such as NFS or Lustre. This file system is attached to/available on the virtual cluster interconnect to which the cluster's virtualized compute and head nodes are also attached. Delivering low latency and high bandwidth over the virtual interconnect is a real challenge.
Underlying the file system are virtualized block devices that provide the requisite strong consistency and high availability. However, unlike the use of highly available RAID storage in traditional HPC deployments, the durable storage in HPC cloud deployments use non-POSIX-compliant BLOB stores across multiple nodes in the cluster. Providing durability over the network and still meeting the aggressive latency targets required by HPC applications can be a daunting task.
The Storage Performance Development Kit (SPDK) is built on Data Plane Development Kit (DPDK) used to accelerate packet processing. SPDK delivers high performance without dedicated hardware in Linux* user space. SPDK employs use level polled mode drivers to avoid kernel context and interrupt overhead. Virtual function support for virtual machines also minimizes overhead of hypervisor interaction.
SPDK has demonstrated large improvements in Intel® Ethernet DCB Service for iSCSI target and TCP/IP stack processing and significant latency and efficiency improvements with its NVMe driver, while reducing BOM costs in storage solutions.
Using storage nodes running SPDK, cloud systems can deliver the higher performance and lowest latency storage to HPC applications. With cloud deployments scaling larger and larger and storage media getting faster, the demand on high throughput low latency storage processing will continue grow. SPDK is a major step forward in reducing storage latency and increasing storage bandwidth.