SATA SSD Misconceptions on Linux – Alignment, IO Scheduler, TRIM

In an attempt to clarify some of the misinformation and myths surrounding the use of SSDs, I want to list what I feel are the 3 most misunderstood topics when using SSDs with a Linux operating system. All command syntax was checked on both Ubuntu 13.10 and CentOS 6.4.

Block Alignment

This is an area where you can still find a LOT of incorrect information out on the internet. Many people still believe that aligning to the erase block is a must for SSDs. However, this is not necessary. Simply aligning to a 4096KB block (or a multiple thereof) will give you the best performance. Many of the common partitioning tools will do this automatically. However, it is easy to check with /usr/bin/blockdev:

#blockdev --getalignoff /dev/<partition>

If the returned value is "0", the partition is aligned.

IO Scheduler

Many Linux distros use the IO scheduler CFQ as the default, some others use Deadline. Most people who are experienced with SSDs will tell you to change from CFQ to NOOP scheduler to improve the performance of the SSD by removing the queueing effect of CFQ. However, it seems that many people overlook that CFQ has SSD support built in.

"CFQ has some optimizations for SSDs and if it detects a non-rotational
media which can support higher queue depth (multiple requests at in
flight at a time), then it cuts down on idling of individual queues and
all the queues move to sync-noidle tree and only tree idle remains. This
tree idling provides isolation with buffered write queues on async tree."
- https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt Accessed 12/4/2013.

According to the author of CFQ, this support was added in Fall 2008.

I would recommend testing your configuration to see which scheduler gives you better performance with your chosen SSD.

In our past internal testing, we have seen situations where NOOP gives better results than CFQ, but this is usually when multiple, high-performance drives are used and maximum performance is the goal.

For testing, you can easily change the scheduler on the fly with this command (this method will not survive a reboot):

# cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]  (shows the current scheduler 'CFQ')

# echo NOOP > /sys/block/sda/queue/scheduler  (changes the value of the scheduler)

# cat /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq  (shows the new value)

For a simple baseline test, check my other blog "Making Friend's with Your New SSD - a Simple Baseline".
https://communities.intel.com/community/itpeernetwork/blog/2013/11/22/making-friends-with-your-new-ssd--a-simple-baseline

TRIM

There is a lot of information about the use of TRIM out on the internet. TRIM is a function that the OS can use to tell an SSD which blocks are no longer in use in the drive. With an HDD, the OS doesn't need to tell the drive which blocks are no longer used, because HDDs can overwrite old information with no deficit in performance. SSDs cannot perform an overwrite of old data. When an SSD needs to write new information to an existing written block, it will copy the contents of the block to a new location that is clean, incorporating the changes as it copies, and update the mapping with the new location. Then the old location is marked as invalid by the drive. If there are no clean blocks to write to, the SSD must erase a block that was previously marked as invalid, move the valid data with the changes, update the mapping, and mark the old block as invalid. This "erase before write" operation, also called garbage collection, can cause a small delay and therefore decrease performance of the drive. TRIM attempts to limit this effect by allowing the OS to tell the drive which blocks are no longer used. That way the drive can do garbage collection in the background so that when the drive needs to write new data, the blocks are already erased.

Full support for TRIM was not added to the Linux kernel until version 3.7. That means you have to use a pretty recent release of your chosen Linux distro in order to implement it. For a lab, that is not an issue. But in the datacenter, this could cause problems since OS's tend to be upgraded slower in the data center. Also, use of TRIM can cause some performance impact itself. It should be tested, or done at some time right before an application is started from after a maintenance period perhaps. Data Center SSD's manage the NAND non-volatile memory for you, and in a way so that the SSD runs efficiently and consistently for your application. TRIM should not be something you employ for most workloads in the data center, and you should use it only once test data tells you it makes sense.

So, again, I recommend doing some testing. In your situation, TRIM may or may not give a performance improvement.
Before testing, make sure your kernel version is correct (3.7 or above), make sure you are using a TRIM compatible file system (Ext4, BtrFS, JFS, and XFS) and make sure your SSD supports TRIM:

# hdparm -I /dev/sda |grep TRIM
        *    Data Set Management TRIM supported (limit 1 block)
        *    Deterministic read data after TRIM
 
If TRIM is supported, as shown above, you can implement TRIM in your /etc/fstab file by adding the "discard" flag to your mount statement for the partitions on your SSD:

/dev/sda1  /       ext4   defaults,noatime,discard   0  1
/dev/sda2  /home   ext4   defaults,noatime,discard   0  2

All tests were run in Intel SSD Labs.