Firstly I need to state that this blog is late. Back in April of this year, I blogged about the World’s first 32 Node All Flash Virtual SAN with NVMe. The reception to the demo we gave at EMC World was so enthusiastic we decided to turn up the wick and add 32 more nodes- getting us to the maximum allowed cluster size in VMware’s Virtual SAN product. Hence the title of this blog and subtle nod to the Beatles.
This 64 node incarnation of Virtual SAN was more a less a doubling down of the hardware used in the 32 node version, and it was completed in time and shown at the Intel Developer Forum in August and VMworld US in September (I did mention this blog is late).
Next up on tour is VMworld Europe, hosted in Barcelona during the week of October 11th. The cluster itself will be online during Solutions Expo hours in the Intel booth during the conference. Additionally, there is a breakout session, STO4688, on Tuesday October 13th at 5:00 PM, where John Hubbard and I will provide a detailed overview of the cluster build specifics, performance characteristics, and key learnings stemming from building a Virtual SAN cluster at this scale.
Cluster BOM Overview
The cluster itself is contained in 2 separate 48U 19” server racks. Each rack is comprised of 32 Intel® Server System R1208WTTGS 1U servers. Each server is equipped with:
- Dual Intel® Xeon® E5-2699v3 (18 Core @ 2.3Ghz)
- Intel® Server Board S2600WTT
- 128 GB DDR4 RAM
- Boot Drive
- 1x Intel® SSD DC S3710 Series (200 GB, 2.5”)
- Virtual SAN SSDs- 2 Disk Groups comprised of
- Intel® Ethernet Server Adapter X520-T2
Cluster Specifications at a Glance
- 6,400 Virtual Machines With Windows® Server
- 64 Hyper-Converged VMware® ESXi Hosts
- 2,304 Xeon® Cores
- 8 TB DDR4 Memory
- 500 TB Raw Flash
- 100+ TB of Virtual SAN Cache
- 400+ TB of Raw Datastore Storage
- 2x Cisco* Nexus 93128TX Switches deployed in top-of-rack fashion
- 192x 10Gbase-T Ports
- 12x 40Gb QSFP Ports
- 20+ KW Under Load, 40 KW Available
Testing a cluster of this size is an art unto itself. The charts below show results for 128 active VMs (2 per host) running various workloads under Iometer. The scope of the testing included both measuring IOps of Random 4 KiB workloads of varying queue depths and read/write ratios, along with measuring bandwidth of sequential 128 KiB read and write workloads also with varying queue depths.
We continue to experiment with and refine reference Virtual SAN architectures at both large and small scales. If you have thoughts as to specific usages/workloads/benchmarks you’d like to see run under Virtual SAN, please leave me a note in the comments as we are always curious to see how people are using these technologies in the wild.