Steps to Scaling Your HPC Environment for AI Workloads

High performance computing (HPC) has a long history of solving complex problems. Today’s advancements in artificial intelligence (AI), combined with HPC, make it possible to address unique and challenging workloads faster than ever before. Because AI uses instruction sets provided by algorithms and theories, it can also ‘learn’ to derive deeper insights from data sets compared to rule-based analytics and data processing applications. For these reasons, academic researchers and government agencies increasingly embrace the robust combination of HPC and AI.

Bringing AI-based workflows into an HPC environment is no easy feat. To help kick-start the planning process, we offer five important considerations below. For more detailed information, you can also read our eGuide focused on bringing AI into HPC environments.

Think holistically about your HPC needs and solution. To provide your stakeholders the ideal AI-enabled HPC environment, your software, hardware, and human skills all weigh into the equation. Academic and government environments depend on HPC systems that support multiple users with unique workloads, so system flexibility is key.

Software selection. By first choosing the necessary software for intended workflows, you can more easily plan for and optimize physical HPC infrastructure to support it. HPC systems enabling research through AI, visualization, simulation, and modeling workflows benefit from software offered by Intel, the open source community, and independent software vendors (ISVs).

HPC applications and development environment. If available applications cannot address your unique HPC usage scenarios, developers must create or modify existing software. While the HPC community offers libraries to assist in this endeavor, developers coding applications for HPC and AI may require specialized skills like optimization for parallel computing. Intel’s HPC interoperable framework assists developers with tools to modernize applications for advanced workloads and support for development languages like Python*, C++*, and Fortran*. For more technical information about languages and frameworks, please check out the eGuide.

Physical infrastructure. Wherever possible, make the best use of your existing HPC infrastructure. By evaluating system elements like processors, storage, fabric, and memory against your users’ software requirements, you can more effectively identify potential bottlenecks. If current hardware impedes performance, upgrades may be needed. Planning and budgeting for updated system infrastructure will maximize return on investment (ROI) and avoid overprovisioning.

Validate your HPC technology first.  Organizations lacking in-house HPC experts should consider support from Intel, a consultant, or an original equipment manufacturer (OEM) to accelerate system upgrades and deployment. Before a full-scale rollout, validating a test system for performance—and the value of data insights provided by applications—proves beneficial. Once the validation process demonstrates delivery of the needed outcomes, prepare for deployment at scale plus ongoing administration and maintenance.

To find out how Intel’s HPC technologies can ready your organization for AI, talk to your preferred system provider, or learn more at intel.com/hpc. Please also see the links below for helpful information:

eGuide: Bringing AI Into Your Existing HPC Environment, and Scaling It Up

Intel® Xeon® Scalable Processor for HPC

Accelerating AI with Intel Omni-Path Architecture


Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration.  No computer system can be absolutely secure. Check with your system manufacturer or retailer, or learn more at http://www.intel.com.

Published on Categories High Performance ComputingTags , , , , ,
Trish Damkroger

About Trish Damkroger

Trish Damkroger is Vice President and General Manager of the Technical Computing Initiative (TCI) in Intel’s Data Center Group. She leads Intel’s global Technical Computing business and is responsible for developing and executing Intel’s strategy, building customer relationships and defining a leading product portfolio for Technical Computing workloads, including emerging areas such as high performance analytics, HPC in the cloud, and artificial intelligence. Trish’s Technical Computing portfolio includes traditional HPC platforms, workstations, processors and all aspects of solutions including industry leading compute, storage, network and software products. Ms. Damkroger has more than 27 years of technical and managerial roles both in the private sector and within the United States Department of Energy, she was the Associate Director of Computation at Lawrence Livermore National Laboratory leading a 1,000 person group that is one of the world’s leading supercomputing and scientific experts. Since 2006, Ms. Damkroger has been a leader of the annual Supercomputing Conference series, the premier international meeting for high performance computing. Trish has been the General Chair for HPC’s premier industry event Supercomputing Conference 2014 and has been nominated the Vice-Chair for upcoming Supercomputing Conference in 2018 and has held many other committee positions. Ms. Damkroger has a master’s degree in electrical engineering from Stanford University. Trish was nominated and selected for the HPC Wire’s People to Watch list in 2014 and recently in March 2018.