Scaling Software-Defined Storage in Retail

Recently I was afforded the opportunity to collaborate with the Kroger Co.* on a case study regarding their usage of VMware* and their Virtual SAN* product1.  Having spent many a day and night enjoying 4x4 subs and Krunchers* Jalapeño (no more wimpy) chips during my days at Virginia Tech* courtesy of the local Kroger supermarket, I was both nostalgic and intrigued.  Couple that with the fact that I am responsible for qualifying the Intel® Solid State Drives (SSDs) for use in Virtual SAN, it was really a no-brainer to participate.

One of the many eye openers I learned from this experience was just how large an operation the Kroger Co. runs.  They are the largest grocery retailer in the United States, with over 400,000 employees spanning over 3,000 locations.  The company has been around since 1883, and had 2014 sales in excess of $108,000,000. I spent roughly ten years of my career here at Intel in IT, and this was a great opportunity to gain insight, commiserate, and compare notes with another large company that surely has challenges I can relate to.

As it turns out, unsurprisingly, the Kroger Co. is heavily invested in virtualization, with 10’s of 1,000’s of virtual machines deployed and internal cloud customers numbering in the 1,000’s.  Their virtualized environment is powering critical lines of business, including manufacturing & distribution, pharmacies, and customer loyalty programs.

Managing the storage for this virtualized environment using a traditional storage architecture with centralized storage backing the compute clusters presented issues at this scale.  To achieve desired performance targets, Kroger had to resort to all-flash fiber channel SAN implementations rather than hybrid (tiered) SAN implementations.  To be clear, these functioned, but were in direct opposition to the goal of reducing capital costs. This led Kroger to begin looking at Software-Defined Storage solutions as an alternative.  The tenets of their desired storage implementation revolved around: the ability to scale quickly, provide consistent QoS and performance on par with existing SAN-based solutions, and reduce cost.  No small order to be sure.

All-Flash Fiber Channel SAN performance, at about 1/5th the cost

Kroger evaluated multiple technologies, and eventually settled on Virtual SAN from VMware running in an all-flash configuration.  Here is where the other eye opening findings came to light.  Kroger found that their building block solution for Virtual SAN, which includes the Intel® SSD Data Center Family for NVMe, offered IOPS performance within 8% of all-flash fiber channel SAN at about 1/5th the expense, illustrated by the chart below.

IOPS, Cost, and Data Center Footprint Comparison1


This same solution also offered latency characteristics within 3% of all-flash fiber channel SAN, while using approximately 1/10th the footprint in their data centers.

Latency, Cost, and Data Center Footprint Comparison2


Key Takeaways

For the Kroger Co., the benefits of their Virtual SAN-based solution are clear:

  • Hyper-converged:  Virtual SAN yields a roughly 10x reduction in footprint
  • Performance: minimal delta of 8% compared to all-flash fiber channel SAN
  • Cost: approximately 20% of the alternative all-flash fiber channel SAN solution

I wish we had solutions like this on the table during my days in IT - these are exciting times to witness.


1 Intel has received permission from The Kroger Co. to share some of the findings of this case study as it was presented at VMworld 2015.

2 Throughput measured utilizing fio.  Kroger’s all-flash and hybrid SAN configuration details, including footprint, are proprietary.  Cost details proprietary for all storage configurations.  Configuration details for Virtual SAN, virtual machine, and fio testing are listed below.

VMware Virtual SAN single node:

  • Dell R730 with 16x 2.5” drive bays and Dell PowerEdge RAID Controller H730P
  • OS:  ESXi 6.0
  • 512 GB DDR DRAM
  • Dual Intel® Xeon® E5-2699v3 (18 Core @ 2.3Ghz)
  • 3x Virtual SAN disk groups each comprised of:
    • 1x Intel® SSD DC P3700 Series (800 GB, HHHL AIC)
    • 5x Intel® SSD DC S3610 Series (1.6 TB, 2.5” SFF)

Virtual Machine under test:

  • Ubuntu LTS
  • 8x vCPU
  • 16 GB vRAM
  • 1x 256 GB VMDK using VMware Paravirtual SCSI controller

IOPS testing details:

  • Flexible I/O Tester (fio) - sample command used:
    • fio --filename=/home/iotest/fio/sdx --direct=0 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=70 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4k7030test --size=60g

3Latency measured utilizing fio.  See footnote 1 for additional details.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

Copyright © 2016 Intel Corporation.  All rights reserved.

*Other names and brands may be claimed as the property of others.

Published on Categories Archive, Software Defined InfrastructureTags ,

About Ken LeTourneau

Ken LeTourneau has been with Intel for 20 years and is a Solutions Architect focused on Big Data and Artificial Intelligence. He works with leading software vendors on architectures and capabilities for Big Data solutions with a focus on analytics. He provides a unique perspective to leading IT decision makers on why AI is important for 21st century organizations, advising them on architectural best practices for deploying and optimizing their infrastructure to meet their needs. Previously, Ken served as an Engineering Manager and Build Tools Engineer in Intel's Graphics Software Development and Validation group. He got his start as an Application Developer and Application Support Specialist in Intel's Information Technology group.