The world is rapidly transitioning to an exciting new era of HPC, as emerging AI workloads continue deployment and converge with traditional HPC, delivering unprecedented breakthroughs in a huge range of fields, from climate modeling, to medicine, to cosmology. As AI transforms the workloads that are run on HPC systems, technical requirements for those systems are fundamentally changing, with the need for higher performance compute, memory, storage, and networking capabilities, as well as optimized software tools and libraries that will allow developers to deliver the needed performance.
Intel, along with the HPC community, is driving this paradigm shift to this new era of HPC/AI convergence. As we convene this week in Dallas for SC18, we are excited to update you on our vision and how Intel® architecture is enabling the breadth of innovation throughout the HPC community.
Performance Leadership for HPC with the Intel® Xeon® Scalable Processor-based Platform
We are thrilled to announce this month a new class of Intel® Xeon® Scalable processor, codenamed Cascade Lake advanced performance, designed for the most data-demanding workloads. These processors will deliver performance for HPC and technical computing, AI, and Infrastructure-as-a-Service (IaaS) with 48 cores per CPU in a performance-optimized multichip package and unprecedented memory bandwidth1 with 12 DDR4 memory channels per processor—more than in any other CPU.
At SC18, we’re giving a peek into the capabilities of this architecture for HPC and AI workloads, including:
- Intel® Xeon® Scalable processors codenamed Cascade Lake advanced performance will outperform AMD EPYC* 7601 on key HPC benchmarks: Up to 3.4X on LINPACK2 and up to 1.3X on Streams Triad.3
- These future Intel® Xeon® Scalable processors will achieve up to 17x faster AI image recognition inference4 in comparison to the Intel® Xeon® Scalable processor (codenamed Skylake-SP) at launch.
Enabling Cutting-Edge HPC Deployments
We look forward to the launch of these future Intel® Xeon® Scalable processors codenamed Cascade Lake advanced performance and their deployment in cutting-edge HPC systems, such as the North-German Supercomputing Alliance HLRN-IV system from Zuse Institute Berlin and University of Göttingen hosted at ZIB and GWDG, when the processor becomes available in 2019. Utilizing these future processors and Intel® Omni-Path Architecture (Intel® OPA), HLRN-IV will support cutting-edge research in climatology, materials science, physics, chemistry, and countless of other fields.
Current generation Intel® Xeon® Scalable processors are seeing strong uptake amongst world-leading supercomputing systems, including the new SuperMUC-NG system at the Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities. SuperMUC-NG, which also utilizes Intel® OPA, is soon to be one of the most performant computing system in Europe.5
Intel® architecture continues to lead in the world’s most powerful supercomputing systems.6 These and other HPC systems around the world enable AI/HPC applications at unprecedented scale and effectiveness for a broad range of scientific communities and generate innovations and discoveries that benefit people throughout the world.
Accelerating the Adoption of Converged AI/HPC Solution Stacks
As with so many other workloads, artificial intelligence is having a huge impact on data analytics throughout the HPC ecosystem, with neural networks often being used to accelerate discovery and innovation in existing analytics workflows. The community is realizing the opportunity to run both AI and traditional modeling and simulation on existing, standards-based Intel® Xeon® processor-based infrastructure, delivering flexibility, efficiency, and scale for this new era of convergence.
At SC18, we will demonstrate work from Intel to simplify AI/HPC convergence, such as contributions to Slurm Workload Manager* and Univa Grid Engine* to facilitate AI in HPC environments already using these resource managers. We are additionally investigating alternative approaches evolving on this work, such as using Apache Mesos* as the resource manager for all workloads. The availability of these solutions will allow AI-HPDA, modeling, and simulation workloads to be run on HPC infrastructure, thus providing faster time to solution and improved TCO.
Broadening Access to HPC with Intel® Select Solutions
We also continue to work with ecosystem partners to deliver Intel® Select Solutions—Intel-verified systems with predictable, reliable HPC workload performance—to help customers choose and deploy HPC systems more easily and rapidly. These solutions, including Intel® Select Solutions for BigDL on Apache Spark*, Intel® Select Solutions for Simulation and Modeling, and Intel® Select Solutions for Professional Visualization are currently available from partners throughout the HPC ecosystem.
Along with ecosystem partners and end customers, we’re also looking ahead to future Intel Select Solutions that could take advantage of the next generation Intel® Xeon® Scalable processor-based platform, including Intel® Optane™ DC Persistent Memory, Intel® Solid State Drives (Intel® SSDs), Intel® OPA, Intel® Ethernet, and utilizing the Intel® Parallel Studio XE set of tools and libraries.
Cutting-Edge HPC on Intel® Architecture
We at Intel look forward to continuing to support traditional HPC modeling and simulation workloads, while investing in the convergence of AI and HPDA workloads on HPC infrastructure. As the tools of insight and discovery advance, those possessing the most sophisticated systems will have critical advantage when it comes to delivering new discoveries and innovations. Increasingly, these tools will be combinations of AI, advanced analytics, and traditional HPC simulation and modeling workloads—a workload variety that requires the versatility and scalability of Intel® architecture.
I strongly believe, Intel architecture is the right choice to scale your innovation and deliver the next wave of scientific insights and cutting-edge products. Learn more about the possibilities at Intel Booth #3223 this week at SC18 and anytime at Intel.com/HPC.
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com.
Performance results are based on testing or projections as of 6/2017 to 10/3/2018 (Stream Triad), 7/31/2018 to 10/3/2018 (LINPACK) and 7/11/2017 to 10/7/2018 (DL Inference) and may not reflect all publicly available security updates. See configuration disclosure in backup for details. No product can be absolutely secure. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel® microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel® microprocessors. Certain optimizations not specific to Intel® microarchitecture are reserved for Intel® microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice (Notice Revision #20110804).
2 LINPACK: AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 22.214.171.124, AMD BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score = 1095GFs, tested by Intel as of July 31, 2018. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
3 Stream Triad: 1-node, 2-socket AMD EPYC 7601, http://www.amd.com/system/files/2017-06/AMD-EPYC-SoC-Delivers-Exceptional-Results.pdf tested by AMD as of June 2017 compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
4 DL Inference: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50), and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but are functionally equivalent). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“. Tested by Intel as of July 11th 2017 -. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/7/2018.
5 Based on https://www.top500.org/lists/2018/11/
6 Based on https://www.top500.org/lists/2018/11/