SGI’s solution to scale I/O on NVMe RAID.

About a year ago, I was excited to see SGI announcing its UV300 system at SC14. At that stage the system was in the prototype stage, not productized, but what they demonstrated at their booth amazed me. The system included 64 Intel Solid State Drive Data Center P3700 Series NVMe drives, which were launched in June 2014. Cutting edge single image system with corresponding drives.

Besides the unique features of the SGI300 platform that combines 32 sockets of Intel Xeon E7 with NUMAlink interconnect, that’s an amazing platform to see how far we can scale the performance of the multiple NVMe SSDs in the system. That’s exactly what the company did at SC14- demonstrated how far we can scale raw I/O performance by a factor of NVMe SSDs in the system.  With the NUMA optimized NVMe driver SGI was able to achieve a record number of 30 Million IOPS on a 4k Random Reads (64 SSDs) and prove linear performance scalability.

01.png               SOURCE:  https://communities.intel.com/community/itpeernetwork/blog/2014/11/15/sgi-has-built-a-revolutionary-system-for-nvme-storage-scaling-at-sc14

Next step, which is very logical, was to understand the limitations of a file system level, determine its overhead for the maximum bandwidth and IOPS bottleneck on the 4k random workloads. Obviously, there are some challenges here. Having a single file system across a number of NVMe SSDs requires a way to combine it into a single volume. This can be a part of file system functionality LVM or a separate SW RAID built for the purpose. MD Raid can be an option here. It’s a generic Linux SW RAID implementation. In fact, Intel has implemented the extensions to it in “imsm” container options for SATA and recently introduced it for NVMe.

From here I want to refer to SGI’s blog (http://blog.sgi.com/delivering-performance-of-modern-storage-hardware-to-applications) on a recent MD Raid study and XFS file system modifications. They identified the bottleneck for a standard Raid0 implementation in the way of I/O submission, which can be an issue to scale massive NVMe configurations. This resulted in proprietary SGI’s extension in XFS, which is a part of eXFS with extended support of MD RAID with NVMe SSDs. This allows users to continue scaling of I/O parallelism introduced in NVMe specifications.

02.png               SOURCE:  http://blog.sgi.com/delivering-performance-of-modern-storage-hardware-to-applications

Hard to believe? Come to SGI’s booth at SC15 and talk to them about it.

Published on Categories Archive
Andrey Kudryavtsev

About Andrey Kudryavtsev

Andrey Kudryavtsev is SSD Solution Architect in the NVM Solution Group at Intel. His main focus is the HPC area, where he helps end-customers and eco system partners to utilize the benefits of modern storage technologies and accelerate the SSD adoption for NVMe. He holds more than 12 years of total server experience, the last 10 years working for Intel. He is the guru of engineering creativity and is an influence in his field. He graduated from Nizhny Novgorod State University in Russia by Computer Science in 2004. Outside of work, he is the owner and coauthor of many experimental technologies in music, musical instruments, and multi-touch surfaces.