Understanding SSD power consumption.
Unlike spinning hard drives where power consumption is known, well predicted and almost constant, SSDs on other end are not well explained. The reason is simple, there are many details of power modes and consumption are depended on the usage and a workload.
In this blog I’ll mainly focus on Intel® SSD DC Family for NVMe, however everything said here is applicable for Intel DC SATA SSDs as well but on lower scale.
SSDs have no constant power consumption by the nature. It’s different at every stage of the operation and unlike a single “top value”, it represents at least by three different ones (not counting sleep states for client SSDs). Any of those are also dependent on the capacity, i.e. with higher drive capacity the power consumption is also higher. That is expected since more memory components are used or they are denser. So, more chips – more power.
Let’s have a look on the data sheet of the Intel® SSD DC P3700/P3600/P3500 Series. I’ll use P3700 specifications as an example from the table below:
You see there are three different stages measured - in active write, active read and idle stage. Basically, the idle is your lowest power consumption for the drive assuming it’s in a fully working condition, OS is booted and the drive is running having no I/O. It goes by 4W minimum. It’s similar across all SKUs within P3x00 product due to common architecture of those.
Active Read and Active Write – that’s there you see the difference. As you see in the table Writes take more power than Reads. That is expected due to architecture of every NAND-based SSD, which operates in terms of Sectors, Pages (# of sector) and Erase Blocks(# of pages). So, the minimum read is by sector, minimum write is by page(on an empty space) and erase is by EB which causes another SSD term – Write Amplification Factor (WAF), - additional overhead SSD controller must deal at every write I/O. Of course, high programming and erasing voltages play the role too.
Keep in mind these are average numbers.
So, conclusion #1 – Maximum power consumption of NVMe SSDs is only achievable in corner cases, where you write a lot by sequential writes (see notes at the table). In typical data center workloads this number drops significantly due to read/write mix and access pattern – random vs sequential.
Managing power settings.
Intel® DC SSDs offer amazing capability to limit its power consumption. It’s called power governor mode and can be managed by Intel® SSD Data Center Tool or open source nvme-cli utility for NVMe SSDs.
The supported modes are:
• 0: 25-watts for PCIe NVMe devices; 40W for PCIe NVMe x8 devices; Unconstrained for SATA devices.
• 1: 20-watts for PCIe NVMe devices; 35W for PCIe NVMe x8 devices; Typical (7-watts) for SATA devices
• 2: 10-watts for PCIe NVMe devices; 25W for PCIe NVMe x8 devices; Low (5-watts) for SATA devices
To view the current setting, use the “show” command to list all settings or specific power settings.
isdct.exe show -a -intelssd 1
isdct.exe show -d PowerGovernorMode -intelssd 1
To set the mode (0-2) use:
isdct.exe set –intelssd 1 PowerGovernorMode=0
These settings are applied immediately, no system reboot or I/O stop are required. This also means the controlling of that feature can be implemented into server management software and in case of the system power limits quickly reacts with SSD power consumption.
How does performance degrade in different power modes?
Obviously there is a trade off in performance. We’ve learned already sequential writes are the most power consuming. Logically, that type of I/O will be limited in different power modes while other types will remain performing at full speed or have a little degradation. Let’s build a test system and see how that works.
For the test purposes I’m using basic configuration of Core I7, Gigabyte X99-UD4, 8GB DDR4, P3700-2TB (FW171), CentOS 7 build 1511, kernel 3.10.0-327, FIO 2.8, reference test scripts.
SSD preconditioning is required for all tests. I covered it in details here https://communities.intel.com/community/itpeernetwork/blog/2015/03/27/how-to-benchmark-ssds-with-fio-visualizer
So, SSD is in steady performance state for every test run.
Let's start from sequential performance analysis. These tests were performed in following scenario – no power limit, mode 1 (20W), mode 2(10W), no power limit. This is done on fly not stopping a workload running. It also demonstrates how quickly a drive enters and exists from each phase.
There is no performance degradation for reads at all, that’s expected as noted above. Sequential write performance has serious changes from 2GB/s down to 1.25GB/s (mode1) and way down to 0.6GB/s (mode2). That’s a trade off for a lower power.
Random 4K workloads look interesting as well. There is now performance difference at all. P3700-2TB drive operates in the same way across all power modes.
Conclusion #2 - Most workloads are not affected by power mode change unless they have big block sequential writes.
How can that help you?
I enjoy real customer stories; here is one for you about making your system energy efficient with Intel NVMe SSDs.
Last year Intel introduced special NVMe SSD, the P3608 series. It’s available in the add-in card form factor (only) with the capacities up to 4TB. In fact that’s a two separate drives on the same adapter car, which are exposed to the system as two block devices. It’s the monster drive with ultimate performance, but it also offers highest TDP in the Intel NVMe portfolio bringing up to 40W power consumption.
Uhhh… How is that possible to spend a 40W for just SSD and keep the system energy efficient? That’s possible by switching SSD power modes.
The Student Cluster Competition took place at SC15. The students partner with vendors to design and build a cutting-edge cluster from commercially available components, not to exceed a 3120-watt power limit and work with application experts to tune and run the competition codes. Team TUMuch Phun from the Technical University in Munich won the Highest Linpack Award with their PetaStream design which combines Intel® Xeon Phi compute power with high speed storage design based on Intel® SSD DC P3608 Series. They forced the drive into lower power state while maintaining required performance and staying within a power budget. You can read whole article here.
Andrey Kudryavtsev, Intel Corp.