We’ve known the innovators at Aerospike for a few years now, and today we are announcing more than 1 million transaction per second (TPS) on a single server with Aerospike’s NoSQL database. That might not seem like such a big deal, until you realize we are not using DRAM for this, as you’ve seen on some previous posts about Aerospike doing 1 million TPS. We are trading out DRAM for NVM (non-volatile memory) in the classic form of NAND memory. NAND to database fanatics like us is hot, because you store so much more. NoSQL innovators have learned how to utilize NVM with breathtaking performance and new data architectures. NVM is plenty fast when your specification is 1 millisecond per row “get”. In fact it’s the perfect trade-off of, fast, lower cost, and non-volatile. The best thing is the price. Did I tell you about the price yet?
NVM today and even more so tomorrow is a small fraction of the price of DRAM. Better still you are not constrained by say 256GB, or some sweet spot of memory pricing that always leaves you a bit short of goal. Terabyte class servers with NVM give you so much more headroom to grow your business and not reconstruct and upgrade your world in months. How does 6 + Terabytes of NVM database memory on a single box sound?
Here at Intel, we say. Be bold, go deep into the Terabyte class of database server!
So how did we do this? Well our friends at Aerospike make it possible with a special file system (often called a database storage engine), that keeps the hash to the data in DRAM (a very small amount of DRAM, we set it to 64 GB), and the actual 1k or greater (key,value) row is kept in a large and growth capable “namespace” on 4 PCIe SSDs. Aerospike likes Intel SSD for their block level response consistency, because when you replace DRAM and concurrently run at this level of process threading, consistency becomes paramount. In fact we like to target 99% consistency of reads under 1 millisecond, during our tests. Here are the core performance results.
95% read Database Results (Aerospike’s asmonitor and Linux iostat)
|Record Size||Number of clients threads||Total TPS||Percent below 1ms (Reads)||Percent below 1ms
|Std Dev of Read Latency
|Std Dev of Write Latency (ms)||Database size|
|1k with replication||512||1,003,471||96.11||99.98||0.87||0.30||200G|
|Record Size||Read MB/sec||Write MB/sec||Avg queue depth on SSD||Average drive latency||CPU % busy|
1. Data is averaged and summarized across 2 hours of warmed up runs. Many runs executed for consistency.
2. 4k test was network constrained, hence the lower CPU attained during this test.
We ran our tests on 1k, 2k and 4k row sizes, and 1k again with asynch replication turned on. We kept the data row-wise small, which is common for operational databases that manage cookies, user profiles and trade/bidding information in an operational row structure. The Aerospike database does have a binning process that can give you columns, but so many usages exist for strings, so we configured for no-bin (i.e. 1 column). This configuration will give you the highest performance for Aerospike.
The databases we built were from 100GB to 400GB, but as made the database bigger we did not see any drop in performance. We used a small database to maintain some agility in building and re-working this effort over and over. Our scalability problems came about as we scaled the rows sizes and that was at the network level, and no longer as a balancing act between the SSD and threading levels on the CPU. We simply need more network infrastructure to go to larger row sizes. Taking a server beyond 20Gbit of networking per server at a 4k row sizes was a wall for us. Supporting nodes that are producing 40Gbit and higher throughput rates can become an expensive undertaking. This network throughput and cost factor will affect your expense thresholds and be a decision factor on truly how dense of an Aerospike cluster you wish to attain.
Configuration and Key Results
We used Intel's best 18 core Xeon Xeon v3 family servers which support 72 cpu hardware threads per machine. Aerospike is very highly threaded and can use lots of cores and threads per server and with htop we were recording over 100 active threads per monitoring sample, loading the CPU queues nicely. As far as balance to the SSD and queue depths of the SSD we found that achieving our range of 95% to 100% consistency under 1 ms db record retrieval was most perfected at a queue depths of under 32 on these Intel NVMe (non-volatile memory express) SSD’s. The numbers in the asmonitor data table shows that we were actually getting mostly 97% of all transactions running under 1 millisecond. A very high achievement.
Configuration details is below, for those attempting to replicate this work. All components and software is available on the market today. Try the Aerospike Community Edition free for download here.
AEROSPIKE DATABASE CONFIGURATION
|Number of nodes||Two|
|Replication Factor||One (*Two used with 1k rows and replication)|
|RAM Size||64 GB|
|Devices||Two P3700 PCIe Devices per node ( 4 total)|
|Write block Size||128k|
AEROSPIKE BENCHMARK TOOL CONFIGURATION
Example command used to load the database:
./run_benchmarks -h 172.16.5.32 -p 3000 -n test -k 100000000 -l 23 -b 1 -o S:2048 -w I -z 64
Example command used to run the benchmark from client:
./run_benchmarks -h 172.16.5.32 -p 3000 -n test -k 100000000 -l 23 -b 1 -o S:2048 -w RU,95 -z 64 -g 125000
Flags of Aerospike Client:
-u Full usage
-b set the number of Aerospike bins (Default is 1)
-h set the Aerospike host node
-p set the port on which to connect to Aerospike
-n set the Aerospike namespace
-s set the Aerospike set name
-k set the number of keys the client is dealing with
-S set the starting value of the working set of keys
-w set the desired workload (I - Linear 'insert'| RU, - Read-Update with 80% reads & 20% writes)
-T set read and write transaction timeout in milliseconds
-z set the number of threads the client will use to generate load
-o set the type of object(s) to use in Aerospike transactions (I - Integer| S: - String | B: - Java blob)
-D Run benchmarks in Debug mode
Dell R730xd Server System
One primary (dual system with replication testing)
Dual CPU socket, rack mountable server system
Dell A03 Board, Product Name: 0599V5
CPU Model used
2 each - Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz max frequency: 4Ghz
18 cores, 36 logical processors per CPU
36 cores, 72 logical processors total
DDR4 DRAM Memory
Dell* 1.0.4 , 8/28/2014
Intel® Ethernet Converged 10G X520 – DA2 (dual port PCIe add-in card)
1 – embedded 1G network adapter for management
2 – 10GB port for workload
Internal Drives and Volumes
/ (root) OS system – Intel SSD for Data Center Family S3500 – 480GB Capacity
/dev/nvme0n1 Intel SSD for Data Center Family P3700 – 1.6TB Capacity, x4 PCIe AIC
/dev/nvme1n1 Intel SSD for Data Center Family P3700 - 1.6TB Capacity, x4 PCIe AIC
/dev/nvme2n1 Intel SSD for Data Center Family P3700 - 1.6TB Capacity, x4 PCIe AIC
/dev/nvme3n1 Intel SSD for Data Center Family P3700 - 1.6TB Capacity, x4 PCIe AIC
6.4TB of raw capacity for Aerospike database namespaces
Operating System, kernel
& NVMe driver
Red Hat Enterprise Linux Server Version 6.5
Linux kernel version changed to 3.16.3
nvme block driver version 0.9 (vermagic: 3.16.3)
Note: Intel PCIe drives use the Non-Volatile Memory express storage standard for Non-volatile memory, this requires an NVMe SSD software driver in your Linux kernel. The currently recommended kernel is 3.19 based for work such as this, benchmark results.
PCIe NVMe Intel drives latest firmware update and tool
Intel embeds its most stable maintenance release support software for Intel SSD’s into a tool we call Intel Solid State Drive Data Center Tool. Our latest release just landed and it important that you use the MR2 release included in the latest version 2.2.0 to achieve these kind of results for small blocks. Intel’s firmware for the Intel SSD for Data Center PCIe family gets tested worldwide by hundreds of labs many of them directly touched by software companies such as Aerospike. No other SSD manufacturer is as connected both in the platform and in the software vendor collaboration space as Intel is. Guaranteeing you the Solutions level scalability you see in this blog. Intel’s SSD products are truly platform connected and end user software inspired.
The world of deep servers that dish out row-based Terabytes has arrived, and feeding a Hadoop cluster or vice-versa from these kind of ultra-fast NoSQL clusters is gaining traction. These are TPS numbers never heard of in the Relational SQL world from a single server. NoSQL has gained traction as purpose built, fast, and excellent for use cases such as trading, session and profile management. Now you see this web scale friendly architecture move into the realm of immense data depth per node. If you are thinking 256GB of DRAM per node is your only option for critical memory scale, think again, those days are behind us now.
You can see the back story on this by visiting our partner Aerospike and looking for the webinar by Frank Ober.
here is the link:
Special thanks to Swetha Rajendiran of Intel and Young Paik of Aerospike for their commitment and efforts in building and producing these test results with me.