Steps to Scaling Your HPC Environment for AI Workloads

High performance computing (HPC) has a long history of solving complex problems. Today’s advancements in artificial intelligence (AI), combined with HPC, make it possible to address unique and challenging workloads faster than ever before. Because AI uses instruction sets provided by algorithms and theories, it can also ‘learn’ to derive deeper insights from data sets compared to rule-based analytics and data processing applications. For these reasons, academic researchers and government agencies increasingly embrace the robust combination of HPC and AI.

Bringing AI-based workflows into an HPC environment is no easy feat. To help kick-start the planning process, we offer five important considerations below. For more detailed information, you can also read our eGuide focused on bringing AI into HPC environments.

Think holistically about your HPC needs and solution. To provide your stakeholders the ideal AI-enabled HPC environment, your software, hardware, and human skills all weigh into the equation. Academic and government environments depend on HPC systems that support multiple users with unique workloads, so system flexibility is key.

Software selection. By first choosing the necessary software for intended workflows, you can more easily plan for and optimize physical HPC infrastructure to support it. HPC systems enabling research through AI, visualization, simulation, and modeling workflows benefit from software offered by Intel, the open source community, and independent software vendors (ISVs).

HPC applications and development environment. If available applications cannot address your unique HPC usage scenarios, developers must create or modify existing software. While the HPC community offers libraries to assist in this endeavor, developers coding applications for HPC and AI may require specialized skills like optimization for parallel computing. Intel’s HPC interoperable framework assists developers with tools to modernize applications for advanced workloads and support for development languages like Python*, C++*, and Fortran*. For more technical information about languages and frameworks, please check out the eGuide.

Physical infrastructure. Wherever possible, make the best use of your existing HPC infrastructure. By evaluating system elements like processors, storage, fabric, and memory against your users’ software requirements, you can more effectively identify potential bottlenecks. If current hardware impedes performance, upgrades may be needed. Planning and budgeting for updated system infrastructure will maximize return on investment (ROI) and avoid overprovisioning.

Validate your HPC technology first.  Organizations lacking in-house HPC experts should consider support from Intel, a consultant, or an original equipment manufacturer (OEM) to accelerate system upgrades and deployment. Before a full-scale rollout, validating a test system for performance—and the value of data insights provided by applications—proves beneficial. Once the validation process demonstrates delivery of the needed outcomes, prepare for deployment at scale plus ongoing administration and maintenance.

To find out how Intel’s HPC technologies can ready your organization for AI, talk to your preferred system provider, or learn more at intel.com/hpc. Please also see the links below for helpful information:

eGuide: Bringing AI Into Your Existing HPC Environment, and Scaling It Up

Intel® Xeon® Scalable Processor for HPC

Accelerating AI with Intel Omni-Path Architecture


Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration.  No computer system can be absolutely secure. Check with your system manufacturer or retailer, or learn more at http://www.intel.com.

Published on Categories High Performance ComputingTags , , , , ,
Trish Damkroger

About Trish Damkroger

Patricia (Trish) A. Damkroger is vice president and general manager of the High Performance Computing organization in the Data Platforms Group at Intel Corporation. She leads Intel’s global technical and high-performance computing (HPC) business and is responsible for developing and executing strategy, building customer relationships and defining a leading product portfolio for technical computing workloads, including emerging areas such as high-performance data analytics, HPC in the cloud and artificial intelligence. An expert in the HPC field, Damkroger has more than 27 years of technical and managerial expertise both in the private and public sectors. Prior to joining Intel in 2016, she was the associate director of computation at the U.S. Department of Energy’s Lawrence Livermore National Laboratory where she led a 1,000-member group comprised of world-leading supercomputing and scientific experts. Since 2006, Damkroger has been a leader of the annual Supercomputing Conference (SC) series, the premier international meeting for high performance computing. She served as general chair of the SC’s international conference in 2014 and has held many other committee positions within industry organizations. Damkroger holds a bachelor’s degree in electrical engineering from California Polytechnic State University, San Luis Obispo, and a master’s degree in electrical engineering from Stanford University. She was recognized on HPC Wire’s “People to Watch” list in 2014 and 2018.