Accelerating the Convergence of HPC and AI at Exascale

The convergence of HPC and AI is leading us into an exciting time for accelerating discovery and innovation. As I mentioned in my first blog of the series, Intel recognizes that a critical element of accelerating convergence is to start with a strong foundation. We want to create an infrastructure around a powerful and versatile CPU with built-in acceleration for both HPC and AI applications. Additionally, Intel recognizes the need to solve complex challenges that some current solutions are not yet addressing. The evolving and nascent nature of AI algorithms require varying power and performance curves. This drives our XPU strategy and the need for discrete acceleration such as GPU’s, FPGA, and ASICs. A second blog discussed oneAPI and its benefits for coding across heterogeneous architectures. In this piece, I focus on the Xe architecture.

Xe architecture and Exascale-ready processors

Xe is a single GPU architecture with four microarchitectures optimized to support multiple segments and usage models. These microarchitectures must meet several requirements for a range of workloads and performance levels, including:

  • Great performance per watt for mobile usages,
  • High performance graphics for gaming, media, enterprise applications and AI, and
  • The convergence of HPC and AI at every scale, including Exascale.

As we previously announced, Ponte Vecchio is our first Xe-HPC based GPU and it is our most ambitious project that combines many advanced technologies across all of Intel’s six pillars of technology. Ponte Vecchio, along with a future generation Intel® Xeon® Scalable processor (code name Sapphire Rapids), will power the Aurora Exascale supercomputer at Argonne National Laboratory.

From an architectural standpoint, Ponte Vecchio addresses both the needs of a broad range of HPC workloads and the complementary requirements of AI training and inference.

A fundamental capability of GPUs involves delivering high performance on vector computations. One of the key differentiators of the Intel approach is the support for variable vector width – SIMD and SIMT – with the flexibility to combine them, enabling improved performance for many applications.

While the debate over the level of precision needed for AI continues, there is no debate that we need fast training times and high compute throughput. To support these wide-ranging requirements, Intel Ponte Vecchio GPUs will support a new data-parallel matrix engine for a broad range of AI data types – INT8, BF16, FP16, and TF32. FP64 is supported for HPC, too.

Ponte Vecchio will support thousands of execution units and provide a scalable memory fabric with unified, coherent memory between GPUs and unified with the CPU to provide the highest efficiency and memory bandwidth across the node.

Reliability is a critical challenge for Exascale computing. In addition to supporting Intel Xeon processor-class RAS features, ECC, and more, Ponte Vecchio GPUs also enable capabilities for in-field repair.

Aurora – Bringing it all together

From an architectural standpoint, each node of the Aurora system at Argonne National Laboratory will include two Sapphire Rapids processors. These will be connected “all-to-all” to six Intel Ponte Vecchio GPUs, with unified memory between the CPUs and GPUs. Each node will also include eight fabric endpoints enabling an unparalleled I/O scalability across nodes. Aurora will also benefit from oneAPI bringing unified programming across CPU and GPU, and Distributed Asynchronous Object Storage (DAOS), the all-new parallel-file system built on Intel® Optane™ persistent memory. DAOS is optimized and ready for the convergence of HPC and AI.

Our team at Intel is excited about our latest advancements, and our contributions detailed in this blog series. We still have plenty of work to do. However, we look forward to ushering in a new era of innovation fueled by technological advancements.

Learn more about our HPC advancements.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

All product plans and roadmaps are subject to change without notice.

© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.

Published on Categories Artificial Intelligence, High Performance Computing, UncategorizedTags , , , , ,
Trish Damkroger

About Trish Damkroger

Patricia (Trish) A. Damkroger is vice president and general manager of the High Performance Computing organization in the Data Platforms Group at Intel Corporation. She leads Intel’s global technical and high-performance computing (HPC) business and is responsible for developing and executing strategy, building customer relationships and defining a leading product portfolio for technical computing workloads, including emerging areas such as high-performance data analytics, HPC in the cloud and artificial intelligence. An expert in the HPC field, Damkroger has more than 27 years of technical and managerial expertise both in the private and public sectors. Prior to joining Intel in 2016, she was the associate director of computation at the U.S. Department of Energy’s Lawrence Livermore National Laboratory where she led a 1,000-member group comprised of world-leading supercomputing and scientific experts. Since 2006, Damkroger has been a leader of the annual Supercomputing Conference (SC) series, the premier international meeting for high performance computing. She served as general chair of the SC’s international conference in 2014 and has held many other committee positions within industry organizations. Damkroger holds a bachelor’s degree in electrical engineering from California Polytechnic State University, San Luis Obispo, and a master’s degree in electrical engineering from Stanford University. She was recognized on HPC Wire’s “People to Watch” list in 2014 and 2018.