How would you measure you comfort, user experience, smoothness, and happiness while producing music?
Intel Optane SSDs open a full horizon of new application usages and use cases. But how would you translate your device-level performance into an application performance improvement? And how would that be translated into the user experience improvements — the ultimate goal of any technology progress? Well, that’s a question I ask myself while evaluating new technologies. In most conditions that can be measured by benchmarks, pure comparing scores or runtime could mean an advantage of one technology over another one. In certain cases that can be just tangible, such as how would you measure the smoothness of your experience or how would you score your feelings? Well that’s more difficult, as everyone can have a different perspective. In this blog I’ll attempt to make some formal assessment of those feelings based on the recent story. If you haven’t had a chance to see Intel's interview with top electronic music and film composer, BT, find a moment now. It’s worth it!
BT is one of the most innovative musicians who utilizes newest technologies in his music production and creates his own. His work for movie scoring is impressive (The Fast and the Furious, Solace, Stealth), and uses the latest advantages of massively sampled orchestration available in real-time. While sampling has existed for years, the way he pushes it to the limits with hybrid orchestra approach and granular synthesis is quite remarkable.
As a user of Intel SSDs 750-series, he was excited by NVMe SSDs and the performance advantages PCIe interface brings into that. Combining multiple SSDs in the RAID volume allows him to improve the overall bandwidth and, of course, expand the capacity. That’s a great deal, and RAID capability is built in all operating systems today. However, RAID can’t improve the access latency. No matter how many drives you combine together, the access latency would represent the worst drive in the array. That means it’s always equal or higher than a standalone SSD latency. There is a class of applications that can’t keep up scaling the performance by only SSD bandwidth improvements and that story is a demonstration on that. Device latency is one of those requirements for the audio sample playback performance improvements.
A complete orchestra is sampled into terabytes of a sample data with a playback of up to 3,000 tracks at a time. Available DRAM is only capable for the small pieces of those sounds (attacks), while the body of the sound is streamed directly from storage. For real-time playback, it is critical all data processing is completed within an audio buffer time — say 5ms, which is common latency these days. Otherwise the user will experience audio drops and other artifacts, including fatal interruptions. This is the case where scaling storage bandwidth can’t help to solve the problem.
Let’s look at the facts. A single sample is a contiguous piece of a data. Let’s assume each sample is running at 48kHz * 32bits Stereo, which is translated into 0.37 MB/s bandwidth. You would expect that with PCIe SSD, which as an example can read data sequentially at 2.5GB/s, you can play ~7M samples at a time (2.5*1024*1024/0.37). Why would I ever need faster storage if this number far exceeds any real use case? Well, the conclusion is wrong. Sample libraries are based on the thousands of samples played at a time. Different layering, microphone position, and round robin sample rotation are multiplying that by the order of magnitude. Also, streaming of many sequential fragments at a time causes I/O randomization naturally. Now, a workload is randomized with a lowest denominator, which is an application request size or even file system sector size in a common case. With that the storage workload is no longer sequential and must be measured in the IOPS form on a small block size. This is fully random I/O condition for the device-level perspective and it’s distributed across full span of sample library with no hot area.
Here we came to the point where NAND-based SSD performance has significant variation based on workload parameters. That’s easier for a drive to run a single threaded sequential workload than a random one or even than many parallel sequential. Of course, the difference is not as noticeable as with hard drives, where you must physically move a head, which has significant latency impact on results and unbelievable performance degradation. But the performance impact is meaningful, too. The root cause is in the NAND architecture, which consists of sectors (minimal read size), pages (# of sectors, determines minimal write size) and erase block size (# of pages, minimal erase size). Combined with a specific NAND-based SSD acceleration on aggregating sequential I/O into a bigger transfer size, we see performance improvements in sequential I/O, which are not available for Random small block I/O.
A 3D XPoint™ memory cell solves that problem. It's cache line addressable by the architecture, requires no erase cycle before write, and significantly lowers access time compared to NAND. Implemented on a block device, Intel Optane™ SSDs are optimized for a low latency and high IOPS performance, especially on low queue depth. This directly correlates with an exceptional quality of service, which represents max latency and latency distribution. As a consequence of that, Optane SSD is capable of delivering similar performance no matter the workload — random vs. sequential or read vs. write.
Let's run some tests to visualize that. I'll be running this experiment on Microsoft Windows 10. You may expect Linux or OS X charts similar or better, but as we’re evaluating an environment similar to the one installed in BT’s studio, I’ll try to match it here.
NAND-based SSD is in the sustained performance state before every run. Optane SSD doesn’t have this side effect and delivers performance right away.
As you see on charts, I’m only considering a scenario of the I/O randomization, and the overall delta in absolute SSD performance under different conditions. I’m leaving other workloads to the side, which are evaluated thoroughly by a third party such as Storage Review, Anandtech, PC Perspective, and others. All of the simulated workloads are stressful for a SSD, in regards of getting to the maximum performance of the device by pushing many I/Os. Intel Optane SSD leads not only on the absolute numbers, but also on the performance variability between workloads. In a real application scenario, such as in the story above, that means stable and predictable performance for a sample playback that doesn’t change its characteristics based on the number of samples, their sizes, the way they are played or any other activities while doing that, such as multitrack record. You may call it "a performance budget" you can split between workloads without sacrificing overall performance.
For a musician that means Optane delivers a smooth experience without audio drops, even at peak demands. That also means no need for the offline rendering, channel freezing and sub mixdowns, which equals more time for being creative and unique.