HPC and AI Converge Under a Common Architecture

High performance computing (HPC), once specialized to scientific and government supercomputers, has expanded to a range of workloads, including visualization, analytics, and artificial intelligence (AI). With so many different types of highly-demanding tasks now open for enterprises to pursue, Intel has created a family of Intel® Select Solutions based on a common foundation of hardware and software to address today’s HPC workloads.

All of the solutions share a common foundation, utilizing the Intel® HPC Platform Specification to deliver consistency and compatibly with a wide range of applications. Each solution adds capabilities that tailor the solution to a specified use case, without compromising compatibility.

Foundational Workloads

Existing HPC solutions are targeted at three foundational workloads:

  • Simulation & Modeling—designed for scientists and engineers in a variety of fields who rely on HPC simulation and modeling to improve performance and productivity, Intel® Selection Solutions for Simulation & Modeling are compatible with industry standard software from leading vendors like ANSYS*, COMSOL*, and Dassault*. Organizations, from aerodynamics to manufacturing, can streamline deployment with the solution and start simulation and modeling workloads quickly.
  • Professional Visualization—visualization is critical to analyzing and gaining insights from the modeling and simulation results. With Intel® Select Solutions for Professional Visualization, organizations can realize software defined visualization benefits faster than trying to construct their own systems. Utilizing the Intel® Rendering Framework (comprised of OpenSWR*, Embree*, and OSPRay*) to perform in-situ simulation and visualization, the solution provides organizations the convenience of using familiar analysis tools, like ParaView*, on top of trusted Intel infrastructure. The popular application VisIt* is also included in the refreshed solution, allows scientists and engineers to quickly generate visualizations, animate them through time, and save them for presentations. The solutions offer the ability to run larger datasets, achieve faster time-to-insight by avoiding data movement I/O bottlenecks, and reduce the costs of having to move simulation data to disk for post-processing.
  • Genomics Analytics—Intel® Selection Solution for Genomics Analytics builds on the common infrastructure of the other HPC solutions and adds the specialization of BIGstack*, an integrated hardware and software stack designed to run the Broad Institute Genomic Analysis Toolkit (GATK) more quickly, at a larger scale, and with easier deployment.

These three workload-optimized solutions take advantage of the newly released 2nd Generation Intel® Xeon® Scalable processors and deliver generation over generation higher performance, improved price/performance, and enhanced security capabilities. We anticipate a number of companies offering these updated solutions, including Advantech, Atipa, Fujitsu, Megware, Nor-Tech, and RSC.

Additional capabilities will be available in an upcoming v2 releases of Intel Select Solutions for Simulation & Modeling, Simulation & Visualization (an upgrade of the Professional Visualization solution), and Genomics Analytics including enhancements to take advantage of Intel® Optane™ DC persistent memory, Intel® Deep Learning Boost (Intel® DL Boost), Intel® SSDs, Intel® Ethernet, software, accelerators.

A New Era of Convergence

The new Intel® Select Solutions for HPC & AI Converged Clusters joins these three existing HPC solutions, expanding the scope of HPC workloads beyond simulation & modeling, to including both Analytics and AI workloads and integrated workflows. The new solutions leverage the low-latency, high-performance features of 2nd Generation Intel Xeon Scalable processors to offer new capabilities and performance while minimizing data movement, delivering breakthrough capabilities via a converged platform that supports all three workloads.

AI is having a huge impact on data analytics throughout the HPC ecosystem, with neural networks often being used to accelerate discovery and innovation. Organizations are quickly realizing the opportunity to converge both AI and traditional modeling and simulation workloads on a common infrastructure. As AI joins traditional simulation & modeling workloads on HPC systems, there’s a need for higher performance compute, memory, storage, and networking capabilities, along with optimized software tools and libraries.

Building on the simulation and modeling foundation and adding analytic workloads like Apache Spark* and AI workloads like TensorFlow*, the newly launched Intel Select Solutions support integrated workflows that previously had to run on specialized systems. With Intel Select Solutions for HPC & AI Converged Clusters, organizations can use AI to accelerate better scientific results or add analytics to perform in-situ visualization at scale, all on a common, flexible system that minimizes storage repetition and data movement and maximizes infrastructure flexibility and utilization.

Intel is releasing two solution architectures for the new Intel Select Solutions for HPC & AI Converged Clusters, both of which focus on augmenting resource managers to support broader workloads. The first is based on the community project Magpie*, which automates the process of generating interfaces between analytics frameworks like Spark and AI frameworks like TensorFlow, so that they can run seamlessly without any modifications to a traditional HPC resource manager such as Slurm*.

The second is a more integrated solution that builds on the work of Univa Grid Engine* and their Universal Resource Broker, an engine that sits alongside a traditional HPC batch scheduler and can interface into resource manager plugins created with an Apache Mesos* framework. Both solutions allow workload coexistence and workflow convergence across simulation & modeling, analytics, and AI. We will be evaluating and updating additional features and workloads for the new Intel Select Solution for HPC & AI Converged Clusters in the coming months, specifically examining the benefits of Intel Optane Memory for large data sets

Accelerated Benefits

When analytics and AI workloads are brought into HPC infrastructure designed to support simulation and modeling, additional speed benefits cascade through the stack. For example, Apache Spark* and TensorFlow can run faster when connected to a HPC fabric. The new 2nd Generation Intel Xeon Scalable processors also deliver 4x faster images/second with inferencing solutions optimized using Intel DL Boost technology.1

Advania Data Center is planning to offer the new Intel Select Solution for HPC & AI Converged Clusters later this year. Their customer Gimix wants to run mixed workloads in their HPC environment and is looking to this solution to meet their needs.

Intel Select Solutions for HPC offer higher performance and enhanced capabilities in a single environment, eliminating the burden of data transfer between systems. A host of commercial applications in the Intel® HPC Application Catalog are verified and interoperable with all other Intel Select Solutions for HPC, ensuring even greater flexibility.

A complete hardware and software recipe for advancing product innovation, Intel Select Solutions represent a proven set of configurations utilizing Intel architecture building blocks that the ecosystem can innovate on and take to market more quickly. For more information, and to see the full range of Intel Select Solutions, visit intel.com/selectsolutions. For more about Intel’s work in HPC, visit intel.com/hpc. For more information on how to accelerate your data insights and building your infrastructure, visit intel.com/yourdata.

1 The solution was tested with TensorFlow/ ResNet50 for Inference (comparing INT8 and FP32 tests) and TensorFlow / ResNet50 for Training on March 28, 2019 with the following hardware and software configuration:

Base configuration: 4 Nodes, 2x Intel® Xeon® Gold 6252; 1x Intel® Server Board S2600WFT; Total Memory 192 GB, 12 slots/16 GB/2666 MT/s DDR4 RDIMM; HyperThreading: Enable; Turbo: Enable; Storage(boot): Intel® 800GB SSD OS Drive, Storage(capacity): 2x 750GB Intel® Optane SSD DC P4800X PCIe; NIC: 1x Intel XC710, PCH: Intel C621; OS/Software: CentOS Linux release 7.6.1810 (Core) with Kernel 3.10.0-957.el7.x86_64; BIOS CPU microcode 0x400000a

Framework version: TensorFlow 1.13.1,; Dataset: Synthetic from benchmark tool; Model topology: ResNet 50 v1; Batch Size: 128

Published on Categories High Performance ComputingTags , , , , ,
Bill Magro

About Bill Magro

Bill Magro, Intel Fellow & Chief Technologist, High-Performance Computing, serves as an HPC strategist, provides HPC software requirements into Intel product roadmaps, and leads Intel’s efforts in HPC Solutions, including HPC in the Cloud. He has worked in the field of HPC for 30 years. He joined Intel in 2000 with the acquisition of Kuck & Associates Inc. (KAI), where he served as product and consulting manager for KAI’s parallel computing tools. Prior to KAI, Bill worked at two NSF-funded supercomputing centers, the Cornell Theory Center and NCSA. He has authored numerous articles published in technical and academic journals and holds nine patents. He is the co-chair of the InfiniBand Trade Association Technical Working Group. He holds a bachelor's degree in applied and engineering physics from Cornell University and a Ph.D. in computational physics from the University of Illinois at Urbana-Champaign.