Cloud Optimized: 2nd Gen Xeon Scalable Processor Provides Robust Support for Cloud Workloads

Today I’m proud to be part of the launch of the 2nd generation Intel® Xeon® Scalable processor. This is the culmination of years of collaboration with cloud service providers (CSPs) to create solutions specifically focused on cloud services, whether public or private. We worked in tandem with the world’s largest cloud companies on the most-pressing technical challenges facing the industry. And to meet the needs of cloud at scale, we’re rolling out our 2nd Generation Intel Xeon Scalable processors, which contain an entire portfolio of products and features. Intel is currently shipping 2nd Generation Intel Xeon Scalable processors to nearly 30 cloud providers, many of which are already seeing remarkable performance and TCO benefits.

During our Data-Centric launch event today, I’m hosting a discussion with several of our key partners on the importance of big data, AI and analytics in a cloud computing environment. The innovations we’re developing today will solve some of the world’s biggest challenges, so it’s important we get it right. I’ve met with many CSP’s and there are several common issues they are focused on, including increased performance, trusted scalability, and developing differentiated solutions. Today, Intel launched a suite of products that addresses all three of these concerns.

Performance

With Intel® Turbo Boost Technology bursts up to 4.4 GHz and up to 2x system memory capacity, 2nd Generation Intel Xeon Scalable processors are cutting-edge processors for today’s largest cloud providers.

Beyond increased density and memory capability, the latest Intel processors have been upgraded to include Intel® Deep Learning Boost (Intel® DL Boost)—an update of Intel® AVX-512 that accelerates artificial intelligence (AI) inference performance for deep learning workloads that are optimized to use vector neural network instructions (VNNI). This can improve performance for AI workloads such as image classification, object detection, speech recognition and translation. Tests have shown image recognition 14x faster using a similar configuration versus first-generation Intel Xeon Scalable processors launched in July 2017.1

Amazon has focused early Intel DL Boost optimization on its voice activated technology Alexa. New Alexa capabilities are driven by AI inference performance, which is why Amazon worked so closely with Intel engineers to utilize the technology to drive performance improvements. The Alexa AI Platform team has been benchmarking deep neural network inference using DL Boost. The goal is to speed up the inference engine for automatic speech recognition and natural language understanding. So far, the results have been promising with initial tests showing over 2x speed up in inference when leveraging DL Boost technology.

Baidu integrated a customized 2nd Generation Intel Xeon Scalable processor into their key infrastructure, and also enabled Intel DL Boost in Baidu’s deep learning framework, PaddlePaddle v1.3. Moving forward, they will replace add-in GPUs to take advantage of Intel DL Boost. Learn more about Intel DL Boost in this Chip Chat episode.

With Intel® Optane™ DC persistent memory, IT managers can take advantage of increased VM density and larger databases that can be stored entirely in memory. Last October, Google Cloud was the first cloud service provider to deliver services based on Intel Optane DC persistent memory. And today Google talked about how Intel Optane DC persistent memory will greatly augment the world class infrastructure that Google Cloud offers to its customers. Early testing has proven beneficial for customers in consolidating SAP HANA instances and improving overall customer experiences. Tencent is also harnessing the benefits of Intel Optane DC persistent memory for their Cloud Redis Storage* and already seeing a higher performance increase and lower cost compared to a DRAM-only system.

Alibaba Group is also using Intel Optane DC persistent memory for its in-house distributed key-value storage system, TAIR, and and is impressed with the TCO they are experiencing (realizing) so far. On their November 11 global shopping festival, Intel’s 2nd generation Xeon Scalable processor and Intel Optane DC persistent memory helped Alibaba reach new heights for the world’s largest online sales. This new platform can efficiently process massive amount of data in real time, enabling digital commerce applications to deliver smooth, responsive user experience.

Trusted Scalability

With higher VM density and the ability to scale from 2 sockets to 8 socket servers, 2nd Generation Intel Xeon Scalable processors can meet the challenges of today’s largest cloud workloads.

Twitter is one of the largest Hadoop* users in the world, with more than 320M monthly active users. Twitter’s Hadoop clusters have more than an exabyte of physical storage. Hard drives (HDDs) can be a significant I/O performance bottleneck for Hadoop clusters, especially in hyper-scale clusters like those at Twitter. Twitter worked in collaboration with Intel engineering to remove the storage bottleneck by adding a fast Intel NVMe SSD and Intel® Cache Acceleration Software (Intel® CAS). As data started flowing at a faster rate, processor utilization went up significantly and the solution was architected with 24-core processors instead of the legacy 4-core processors. The combination of improved storage and higher-core-count processors enabled Twitter to reduce its data center footprint, leading to improved TCO savings and reduced maintenance costs. Twitter expects that they will be able to increase the density of the company’s Hadoop servers by 6X, resulting in approximately 30 percent lower TCO and up to 50 percent faster runtimes.

As data scales, so do threats. Cyberattacks have gotten so sophisticated that software-only security is no longer adequate. Platform integrity is essential for today’s cloud service providers. 2nd Generation Intel Xeon Scalable processors harness hardware-based security features, rooted in silicon, built directly into the foundation of the platform.2 Read more about how the enhanced data protections for data in use, at rest, and in transit have been optimized with the launch of Intel® Security Libraries for Data Center (Intel® SecL-DC) and several other foundational security features later this month in my colleague Anil Rao's blog.

Efficiently moving data in and out of the server is critical for scale. Predictable high-bandwidth, low-latency Ethernet connectivity is an essential area in data centers today. To support the speeds and advanced capabilities that cloud services demand, Intel is announcing the new, next generation Intel® Ethernet 800 Series. Available in Q3, the new Ethernet series is capable of speeds up to 100Gbps and provides breakthrough capabilities including Application Device Queues (ADQ), which improves application performance and consistency in meeting service-level agreements (SLAs). ADQ delivers greater than 50 percent increase in application response time predictability, over 45 percent lower latency, and over 30 percent improved throughput running open source Redis*, a widely used database among cloud service providers.3

The Intel® Difference

For customers looking for differentiated solutions, Microsoft Announced today that they will deliver bare metal infrastructure for advanced workloads based on latest Intel technology. More information to come from Microsoft at SAP Sapphire. In addition, Azure is refreshing their Fv2, Dv3, and Ev3 VM virtual machine (VM) families for compute intensive workloads to include 2nd generation Intel Xeon Scalable processors. For customers seeking a multi-cloud approach to their infrastructure strategy, Microsoft expanded their hybrid and edge portfolio to include Azure Stack HCI for private cloud applications, featuring 2nd Generation Intel Xeon Scalable processors and Intel Optane DC persistent memory.

Intel will continue to work with cloud leaders to optimize each stage of the data lifecycle to move faster, store more, and process everything. Visit intel.com/csp to see all of the advancements we’re making to solve the most challenging issues for cloud service providers.


For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

1 1x inference throughput improvement in July 2017 (baseline): Tested by Intel as of July 11th 2017: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact ‘, OMP_NUM_THREADS=56, CPU Freq set with CPU power frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

14x inference throughput improvement vs baseline: Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel  4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep Learning Framework: Intel® Optimization for Caffe version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, DummyData, 4 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

2 No product or component can be absolutely secure.

3 >50% predictability improvement, >45% latency reduction and >30% throughput improvement with open source Redis* using 2nd Gen Intel® Xeon® Scalable processors and Intel® Ethernet 800 Series with ADQ vs. without ADQ.  Performance results are based on Intel internal testing as of February 2019, and may not reflect all publicly available security updates.  See configuration disclosure for details.  No product or component can be absolutely secure.  Tests performed using Redis* Open Source on 2nd Generation Intel® Xeon® Scalable processors and Intel® Ethernet 800 series 100GbE on Linux 4.19.18 kernel. For complete configuration information see the Performance Testing Application Device Queues (ADQ) with Redis* Solution Brief (http://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/application-device-queues-with-redis-brief.html).