The New Intel Xeon Scalable Processor Powers the Future of AI

One of the fastest growing and most discussed segments in technology today is artificial intelligence (AI). The exponential increase in computing power in the last few decades enabled an explosion in machine learning and deep learning development across industries, and compute architectures that start to look and function like a human brain. To accomplish this, we need three things: large data sets to process and learn from; compute power and neural networks to process those large data sets; and personalization through algorithms that learn and adjust quickly and accordingly.

AI will transform the world and will touch almost every industry. In healthcare, for example, Penn Medicine developed a predictive solution using machine learning and AI approaches to address heart failure. Using a data science detection algorithm, Penn Medicine has said they’ve identified ~20 percent more patients in their hospitals who could benefit from heart failure prevention services*.

The next “big thing” in AI is here

Today, Intel made another leap forward to deliver the compute capability and architecture required by AI: the new family of Intel® Xeon® Scalable Processors. Designed with advanced AI and HPC capabilities, including significant increases in memory and I/O bandwidth, data, scientists will be able to unlock insights faster from data-intensive workloads. It has the capability to accelerate product innovation with its advancements across compute, storage, memory and I/O, which will significantly advance AI.

One key innovation that will significantly improve AI performance is Intel® Advanced Vector Extensions 512 (Intel® AVX 512).  Intel® AVX‑512 instructions give increased parallelism and vectorization, which are important for these workloads. It delivers up to double the flops1 per clock cycle compared to the previous generation Intel® AVX22 for the most demanding computational workloads in applications such as modeling and simulation, data analytics, machine learning, and visualization – the most important enablers of a successful AI system.

Intel® AVX-512 support on Intel Xeon processor Scalable family also enables a converged programming environment for Intel® Xeon® and Intel® Xeon® Phi™ compute nodes. The convergence of compute, memory, network, and storage performance plus software ecosystem optimizations enable a fully virtualized data center able to handle the complexity of advanced analytics and AI workloads.

In terms of the compute capability required for machine and deep learning training and inference, the Intel Xeon Scalable Processor delivers up to 2.2X higher deep learning training and inference performance than the prior generation. And, benefitting from additional software optimizations (e.g. Caffe framework), can achieve up-to 113X deep learning training performance gains compared with non-optimized three-year-old servers, providing a solid foundational architecture for AI workloads3.

Our early-ship customers are already using Intel Xeon Scalable Processors today for AI applications.  For example, Alibaba has broadly adopted artificial intelligence across many business segments. Alibaba’s e-commerce brain serves hundreds of millions of customers globally on Taobao and T-mall platforms with personalized recommendations. Sellers on Alibaba platforms are empowered by Alibaba Store Concierge, an intelligent chatbot, to provide round-the-clock customer service. The new Intel AVX-512 instructions provide much wider vectorization capability, delivering significant latency reduction and throughput enhancements. In some of their business environments, Alibaba has said they’ve seen up to 80 percent performance improvement gen on gen*.

Intel’s AI portfolio

The Intel Xeon Scalable processor joins Intel’s broad (and growing) product portfolio for AI workloads, ranging from general‑purpose compute to highly optimized solutions. The Intel Xeon processors are the mainstay of volume AI deployments. The Intel® Xeon Phi™ family applies to problems that require dense compute and very high levels of parallelism. Intel® FPGAs offer low-latency, low power inference with many different applications in the IoT and automotive space. They also offer outstanding reconfigurability for evolving AI and HPC workloads3 and reduced total cost of ownership (TCO)4.  We’re excited about the upcoming Intel® Nervana™ ASIC, which accelerates training time for neural networks to advance deep learning.

A good example of a workload-optimized solution is Microsoft that uses the combination of Intel Xeon processors and Intel FPGAs to support AI workloads in its Azure cloud platform. Intel Xeon Phi processors, Intel FPGAs and Intel Nervana, along with their optimized set of libraries for applications, all provide a new level of productivity and performance to deploy insightful applications in an easier and more powerful way than ever before*.

A full, end-to-end solution

Software innovation is key to taking advantage of features in underlying hardware. It is also key to accelerating the development process for analytics and HPC applications. A full solution with software and hardware is an important part of powering the AI capabilities of today and the future.

Intel is much more than a hardware company. Our software portfolio ranges from open source performance libraries (Intel® Math Kernel Library for Deep Neural Networks and BigDL) to our own deep learning framework (NervanaTM Neon). We also offer the Intel® Deep Learning SDK, a free set of tools for data scientists and software developers, and we’re committed to enabling the industry with deep learning frameworks for Intel® architecture, including Caffe, theano, torch, and TensorFlow*.

Amazon worked with Intel to optimize both hardware and software to support AI in its cloud. They apply machine-learning techniques across all areas of their business, such as Alexa, customer service, robotic fulfillment, drone delivery with Prime Air, Amazon Go and AWS.  In their new “C5 Instance”, Amazon optimized their deep learning engines around the Intel Xeon Scalable Processors and the latest version of the Intel Math Kernel Library and increased inference performance by over 100x*.

AI will catalyze new capabilities, products and experiences that will forever change how we work, play and live. The new Intel Xeon Scalable Processors are a significant leap forward in the performance and efficiency of cutting-edge systems that enable new AI possibilities that we could have only imagined just a few years ago.

For more information on the Intel Xeon Scalable Processor Family, visit www.intel.com/xeonscalable and for more on Intel artificial intelligence technology, visit www.intel.com/ai.

To learn more about the biggest data center advancements in a decade. Register here: launchevent.intel.com

 

*Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.comFor complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings.  Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the websites referenced in this video. You should visit the referenced website and confirm whether referenced data are accurate.

1 AVX-512 ‘2x flops per clock cycle’ throughput compares performance vs Intel AVX2 (256-bit)

2 As measured by Intel® Xeon® Processor Scalable Family with Intel® AVX-512 compared to an Intel® Xeon® E5 v4 with Intel® AVX2.

3 Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).
Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance

Deep Learning Frameworks: Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), https://github.com/intel/caffe/tree/master/models/default_vgg_19 (VGG-19), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

Platform: 2S Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz (12 cores), HT enabled, turbo enabled, scaling governor set to “performance” via intel_pstate driver, 256GB DDR3-1600 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.21.1.el7.x86_64. SSD: Intel® SSD 520 Series 240GB, 2.5in SATA 6Gb/s, 25nm, MLC.
Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact,1,0‘, OMP_NUM_THREADS=24, CPU Freq set with cpupower frequency-set -d 2.7G -u 3.5G -g performance

Deep Learning Frameworks: Caffe: (http://github.com/intel/caffe/), revision b0ef3236528a2c7d2988f249d347d5fdae831236. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (GoogLeNet, AlexNet, and ResNet-50), https://github.com/intel/caffe/tree/master/models/default_vgg_19 (VGG-19), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent). GCC 4.8.5, Intel MKL small libraries version 2017.0.2.20170110.3 As measured by Intel® Xeon® Processor Scalable Family with Intel® FPGA optimized workload and Intel® Xeon® Processor Scalable Family without FPGA optimized workload.

4 Up to 70% lower 4-year TCO estimate example based on equivalent rack performance using HammerDB* workload comparing 20 installed 2-socket servers with Intel Xeon processor E5-2690 (formerly “Sandy Bridge-EP”) running Oracle Linux* 6.4, HammerDB 2.10 with Oracle 11.2.0.3 compared at a total cost of $1,843,341 to 5 new Intel® Xeon® Platinum 8180 (Skylake-SP) running Oracle Linux* 7.2, HammerDB 2.18 with Oracle 12.1.0.2.0, at a total cost of $551,874 including basic acquisition.  Server pricing assumptions based on current OEM retail published pricing for 2-socket server with Intel Xeon processor E5-2690 and 2 CPUS in a 4-socket server using E7-8890 v4 – subject to change based on actual pricing of systems offered.