Will You Still Love Me When I’m 64?

Firstly I need to state that this blog is late.  Back in April of this year, I blogged about the World’s first 32 Node All Flash Virtual SAN with NVMe.  The reception to the demo we gave at EMC World was so enthusiastic we decided to turn up the wick and add 32 more nodes- getting us to the maximum allowed cluster size in VMware’s Virtual SAN product.  Hence the title of this blog and subtle nod to the Beatles.

This 64 node incarnation of Virtual SAN was more a less a doubling down of the hardware used in the 32 node version, and it was completed in time and shown at the Intel Developer Forum in August and VMworld US in September (I did mention this blog is late).

Next up on tour is VMworld Europe, hosted in Barcelona during the week of October 11th.  The cluster itself will be online during Solutions Expo hours in the Intel booth during the conference.  Additionally, there is a breakout session, STO4688, on Tuesday October 13th at 5:00 PM, where John Hubbard and I will provide a detailed overview of the cluster build specifics, performance characteristics, and key learnings stemming from building a Virtual SAN cluster at this scale.

Cluster BOM Overview


The cluster itself is contained in 2 separate 48U 19” server racks.  Each rack is comprised of 32 Intel® Server System R1208WTTGS 1U servers. Each server is equipped with:

Cluster Specifications at a Glance

  • 6,400 Virtual Machines With Windows® Server
  • 64 Hyper-Converged VMware® ESXi Hosts
  • 2,304 Xeon® Cores
  • 8 TB DDR4 Memory
  • 500 TB Raw Flash
  • 100+ TB of Virtual SAN Cache
  • 400+ TB of Raw Datastore Storage
  • 2x Cisco* Nexus 93128TX Switches deployed in top-of-rack fashion
  • 192x 10Gbase-T Ports
  • 12x 40Gb QSFP Ports
  • 20+ KW Under Load, 40 KW Available

Cluster Performance

Testing a cluster of this size is an art unto itself.  The charts below show results for 128 active VMs (2 per host) running various workloads under Iometer.  The scope of the testing included both measuring IOps of Random 4 KiB workloads of varying queue depths and read/write ratios, along with measuring bandwidth of sequential 128 KiB read and write workloads also with varying queue depths.



Final Thoughts

We continue to experiment with and refine reference Virtual SAN architectures at both large and small scales. If you have thoughts as to specific usages/workloads/benchmarks you’d like to see run under Virtual SAN, please leave me a note in the comments as we are always curious to see how people are using these technologies in the wild.

Published on Categories Data CenterTags , ,

About Ken LeTourneau

Ken LeTourneau has been with Intel for 20 years and is a Solutions Architect focused on Big Data and Artificial Intelligence. He works with leading software vendors on architectures and capabilities for Big Data solutions with a focus on analytics. He provides a unique perspective to leading IT decision makers on why AI is important for 21st century organizations, advising them on architectural best practices for deploying and optimizing their infrastructure to meet their needs. Previously, Ken served as an Engineering Manager and Build Tools Engineer in Intel's Graphics Software Development and Validation group. He got his start as an Application Developer and Application Support Specialist in Intel's Information Technology group.