Upgrade to a NVMe capable linux kernel

Here in Intel Non-volatile Memory Solutions group (NSG) we build and test Linux systems a lot and we've been working to mature the nvme driver stack on all kinds of operating systems. In fact, in the 3.19 build of the kernel the driver will reach a 1.0 version number.  The Linux kernel is the innovation platform today, and it has come a long way in the past year with stability. We always had a high level kernel build document but we are transferring this document into a blog for easier access.  We have refreshed this document a bit towards the latest linux kernel release 3.18 and added some small recommendations on things that need to be done after you are using a newer kernel.  In Enterprise environments, Intel always expects you to get your kernel from your Operating System vendor. This blog is more for people upgrading in a development or in testing and benchmark environments where the most recent operating system bytes are desired. Let us know if you have any feedback.

  1. 1. NVM Express background

The NVM express (NVMe) is optimized PCI Express SSD interface, NVM Express specification defines an optimized register interface, command set and feature set for PCI express (PCIe)-based Solid State Drives(SSD). Please refer to www.nvmexpress.org for background on NVMe.

The NVM Express Linux driver development utilizes the typical open-source process used by kernel.org. The development mailing list is linux-nvme@lists.infradead.org

This document is intended for developer and software companies, it should be noted that kernel 3.3 had a stable nvme driver version included, and various distributions have back ported the driver even to kernel 2.6 versions. The nvme driver is also in-box with many server distributions of Linux, please check with your vendor. Intel encourages server user companies to focus on an in-box nvme driver as your first option.


  1. 2. Development tools required (possible pre-requisites)

In order to clone, compile and build new kernel/driver, the following packages are needed

  1. 1. 1.      ncurses
  2. 2. 2.      build tools
  3. 3. 3.      git (optional you could be using wget to get the Linux package)

You must be root to install these packages

  1. a.      Ubuntu based

apt-get install git-core build-essential libncurses5-dev

  1. b.      RHEL based

yum install git-core ncurses ncurses-devel

yum install groupinstall “Development Tools”

  1. c.      SLES based

               zypper install ncurses-devel git-core

               zypper install --type pattern Basis-Devel

             

  1. 3. Build new Linux kernel with NVMe driver

Pick up a starting distribution, it doesn’t matter from driver’s perspective which distribution you use since it is going to put a new kernel on top of it, so use whatever you are most comfortable with and/or has the tools required.

  1. 1. 1.      Get kernel and driver from the 3.x repository: https://www.kernel.org/pub/linux/kernel/v3.x/

An example snapshot from January 2015, is below:

wget https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.18.3.tar.xz

tar –xvf linux-3.18.3.tar.xz

  1. 1. 2.      Build and install

Run menuconfig (which uses ncurses):

make menuconfig

Confirm the NVMe Driver under Block is set to <M>

Device Drivers-> Block Devices -> NVM Express block device

This creates .config file in same directory.

Then, run as root these make commands (use the j flag as ½ your cores to improve make time)

make –j10

make modules_install –j10

make install –j10

Depending on distribution you use, you may have to run update-initramfs and update-grub, but this is typically unnecessary.

Once install is successful, reboot system to load new kernel and drivers. Usually the new kernel becomes default to boot which is the top line of menu.lst. Verify it with “uname –a” after booting, the running kernel is what you expect. , Use “dmesg | grep –i error” and resolve any kernel loading issues.

  1. 4. NVMe Driver basic tests

There are some basic open source nvme test programs you can use for checking nvme devices:

http://git.infradead.org/users/kbusch/nvme-user.git

  1. 1. 1.      Git’ing source codes

git clone git://git.infradead.org/users/kbusch/nvme-user.git

  1. 1. 2.      Making testing programs

Add/modify Makefile with proper lib or header links and compile these programs

make

  1. 1. 3.      Example, check nvme device controller “identify”, “namespace” etc

>>sudo ./nvme_id_ctrl /dev/nvme0n1

>>sudo ./nvme_id_ns /dev/nvme0n1

Intel SSD Data Center Tool 2.x supports NVMe

Here are some commands you’ll find useful.

Zero out and condition a drive sequentially for performance testing:

dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

You want to wait till the above command reaches an EOF condition and ends. The data transferred by dd is not an indication of your application’s performance.

Many people run a quick sanity test using hdparm but this data should also not be used as an indication of reliable performance expectations.

hdparm -tT --direct /dev/nvme0n1

Aligning drive partitions:

Be sure the starting block of the used partition is divisible by 4096 bytes. You can look at your partition table in the parted tool. Here is an example listed partition table that is aligned.

Number  Start End Size          File system Name Flags

1 1048576B  400088367103B 400087318528B              primary

Using the Start partition block value in bytes,  you divide this number by 4096. Like this:

1048576/4096 = 256

This shows an evenly divisible partition and so the partition will perform well.

Using option “unit b” in parted will present partition start and end values in bytes. Making the math much simpler.

Software RAID:

Using a chunk size of 128k may be a good starting point for your application testing.

             Don’t discard blocks in filesystem usage:

Be sure to turn off the discard option when making your Linux filesystem. This is important as you want to let the SSD manage blocks and its activity between the NVM (non-volatile memory) and host with more advanced and consistent approaches in the SSD Controller.

Core filesystems:

ext4 – the default extended option is not to discard blocks at filesystem make time, retain this, and do not add the “discard” extended option as some information will tell you to do.

xfs – with mkfs.xfs add the –K option so that you do not discard blocks.