Tuning the performance of Intel Optane SSDs on Linux Operating Systems

Intel Optane SSDs are ultra-fast and we wanted to share a few things that you will want to know about Linux to help you get the most out of one of the world’s fastest SSDs. Optane is an SSD that can achieve sub-10 microsecond response time and operate as Software Defined Memory. So a whole new world of application use cases are now evolving around this device.

There are just a few things you want to know and do before you run that first fio script to test the device. It’s fast and easy, so you can quickly get into your application efforts. You should have your own fio script that matches the needs of your application. This document is there as a simple helper blog to get you started.

Optane SSDs perform best when they are used in a newer architecture (with Intel Xeon Scalable Processors) and a higher performance processor with a base frequency of near 3.0 GHz is recommended, but not required. Optane will of course work on slower CPU's, you’ll simply not see as much throughput at one (1) worker. The P4800X drive is simply an NVMe SSD so any x4 capable PCIe 3.0 slot will work fine for connectivity. The U.2 interface is also available at the same time as the add-in-card shown in the picture above, so choose an NVMe capable server with front enclosures if you wish. What else do we recommend in Linux specifically?

Steps to improving performance of Intel SSDs on Linux OS

Step 1: Put your CPU’s in performance mode


# for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || continue; echo -n performance > $CPUFREQ; done

Ensure the cpu scaling governor is in performance mode by checking the following:


# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

You should see performance as the return of this command.

Don’t forget to make this setting persistent between reboots by changing your Linux restart configuration (rc).

Step 2: Disable IRQ balance

In kernels before version 4.8 the irq balancing was not managed efficiently as it is now by the in-box Linux nvme driver. So if you are on older kernels than 4.8, please turn off the irqbalance service and run a short script (below) to balance your irq’s to allow for the best io processing possible. Here are the steps on how to do this between Ubuntu and CentOS.

Set ‘Enabled’ to “0” in /etc/default/irqbalance on Ubuntu. As shown here, you can disable the service with the following command on CentOS:


# systemctl disable –now irqbalance

In CentOS 7, you can use these steps to stop or make it permanent between reboots.


#systemctl stop irqbalance
#systemctl status irqbalance

It should show Active: inactive (dead) on the third line.
Now to make this permanent.


# chkconfig irqbalance off
# chkconfig irqbalance

It will show “disabled”.

Step 3: Setup SMP Affinity

Here is a bash script to set SMP affinity.


#!/bin/bash
folders=/proc/irq/*;
for folder in $folders; do
files=”$folder/*”;
for file in $files; do
if [[ $file == *”nvme”* ]]; then
echo $file;
contents=`cat $folder/affinity_hint`;
echo $contents > $folder/smp_affinity;
cat $folder/smp_affinity;
fi
done
done

Step 4: Choose appropriate ioengine, and I/O polling mode if available

Now how to check your configuration works. The most critical performance will show itself at QD1 (queue depth 1) with just 1 worker thread. You can run this with any number of ioengine. That said, Intel uses polling mode via ioengine pvsync2 with the hipri option. This ioengine requires Fio 2.18 or newer version. Polling mode requires Linux kernel 4.8 or newer. If you are on an older than 4.8 kernel you need to fall back to a different ioengine, say libaio direct or sync as your applications require and this will not provide the same performance as a polling mode driver. Amazing performance will still be achievable, however.

To enable polling mode -


# echo 1 > /sys/block/nvme0n1/queue/io_poll

Example of an fio job parameters file:


[global]
name= OptaneInitialPerfTest
ioengine=pvsync2
hipri
direct=1
buffered=0
size=100%
randrepeat=0
time_based
ramp_time=0
norandommap
refill_buffers
log_avg_msec=1000
log_max_value=1
group_reporting
percentile_list=1.0:25.0:50.0:75.0:90.0:99.0:99.9:99.99:99.999:99.9999:99.99999:99.999999:100.0
filename=/dev/nvme0n1
[rd_rnd_qd_1_4k_1w]
stonewall
bs=4k
iodepth=1
numjobs=1
rw=randread
runtime=300
write_bw_log=bw_rd_rnd_qd_1_4k_1w
write_iops_log=iops_rd_rnd_qd_1_4k_1w
write_lat_log=lat_rd_rnd_qd_1_4k_1w

Results from a system with Intel Gold 6154 CPUs and Linux 4.13 kernel

Summary output from fio:


fio-3.0
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=480MiB/s,w=0KiB/s][r=123k,w=0 IOPS][eta 00m:00s]
rd_rnd_qd_1_4k_1w: (groupid=0, jobs=1): err= 0: pid=12340: Thu Oct 26 16:33:09 2017
read: IOPS=117k, BW=457MiB/s (480MB/s)(134GiB/300001msec)
clat (usec): min=7, max=239, avg= 8.25, stdev= 1.37
lat (usec): min=7, max=239, avg= 8.27, stdev= 1.37
clat percentiles (usec):
| 1.000000th=[ 8], 25.000000th=[ 8], 50.000000th=[ 8],
| 75.000000th=[ 9], 90.000000th=[ 9], 99.000000th=[ 11],
| 99.900000th=[ 34], 99.990000th=[ 39], 99.999000th=[ 57],
| 99.999900th=[ 71], 99.999990th=[ 101], 99.999999th=[ 241],
| 100.000000th=[ 241]
bw ( KiB/s): min=383671, max=501112, per=99.98%, avg=468391.67, stdev=20874.43, samples=299
iops : min=95916, max=125278, avg=117097.94, stdev=5218.58, samples=299
lat (usec) : 10=98.58%, 20=1.22%, 50=0.20%, 100=0.01%, 250=0.01%
cpu : usr=4.20%, sys=95.78%, ctx=9383, majf=0, minf=29
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=35135273,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

 Run status group 0 (all jobs):


READ: bw=457MiB/s (480MB/s), 457MiB/s-457MiB/s (480MB/s-480MB/s), io=134GiB (144GB), run=300001-300001msec
Disk stats (read/write):

nvme4n1: ios=35120877/0, merge=0/0, ticks=246137/0, in_queue=245101, util=81.75%

We hope these simple steps provide you a great first experience with Optane P4800X SSDs on Linux, out of the box. Now comes the fun part. It’s time for you to achieve amazing innovations and a new level of memory flexibility for your business goals, as getting more per server just got a lot easier.

Feel free to reach out to Intel at any time on our support site and we’ll be happy to give you more help on achieving amazing performance with Optane.