About a year ago, I was excited to see SGI announcing its UV300 system at SC14. At that stage the system was in the prototype stage, not productized, but what they demonstrated at their booth amazed me. The system included 64 Intel Solid State Drive Data Center P3700 Series NVMe drives, which were launched in June 2014. Cutting edge single image system with corresponding drives.
Besides the unique features of the SGI300 platform that combines 32 sockets of Intel Xeon E7 with NUMAlink interconnect, that’s an amazing platform to see how far we can scale the performance of the multiple NVMe SSDs in the system. That’s exactly what the company did at SC14- demonstrated how far we can scale raw I/O performance by a factor of NVMe SSDs in the system. With the NUMA optimized NVMe driver SGI was able to achieve a record number of 30 Million IOPS on a 4k Random Reads (64 SSDs) and prove linear performance scalability.
Next step, which is very logical, was to understand the limitations of a file system level, determine its overhead for the maximum bandwidth and IOPS bottleneck on the 4k random workloads. Obviously, there are some challenges here. Having a single file system across a number of NVMe SSDs requires a way to combine it into a single volume. This can be a part of file system functionality LVM or a separate SW RAID built for the purpose. MD Raid can be an option here. It’s a generic Linux SW RAID implementation. In fact, Intel has implemented the extensions to it in “imsm” container options for SATA and recently introduced it for NVMe.
From here I want to refer to SGI’s blog (http://blog.sgi.com/delivering-performance-of-modern-storage-hardware-to-applications) on a recent MD Raid study and XFS file system modifications. They identified the bottleneck for a standard Raid0 implementation in the way of I/O submission, which can be an issue to scale massive NVMe configurations. This resulted in proprietary SGI’s extension in XFS, which is a part of eXFS with extended support of MD RAID with NVMe SSDs. This allows users to continue scaling of I/O parallelism introduced in NVMe specifications.
Hard to believe? Come to SGI’s booth at SC15 and talk to them about it.