HPC in the Cloud

While the cloud has been viable for some time for a number of HPC workloads and use cases, it hasn’t been a practical option for most users. However, that is rapidly changing.

Cloud Service Providers and their partners are now offering HPC as a Service (HPCaaS) via fully-orchestrated services that provision a familiar, compatible, and fully elastic HPC cluster in the cloud. Amazon’s ParallelCluster eases cluster creation and management, while AWS Batch is a fully-managed and MPI-capable HPC service. Microsoft’s CycleCloud provides automated configuration, along with additional services to link on-premise systems to the Azure cloud. In addition to a base HPCaaS platform offering, Rescale offers popular HPC applications on demand. These are just a few examples; several other cloud providers have launched HPCaaS offerings, including familiar names like Oracle and IBM.

We see a number of top reasons people are looking to the cloud for HPC. First, existing HPC users have become constrained by the fixed capacity of on-premise systems. The cloud allows these users to augment those valuable systems in new and novel ways. Second, people who have not traditionally had access HPC now have a natural “on ramp” via cloud, with pay-as-you-go services that address the barrier of up-front capital investments. Finally, HPC users are finding cloud provides a practical way to access the latest technologies as needed, without committing to long-term ownership.

Unlocking Productivity and Innovation

On-premise HPC systems have long been an essential resource for commercial, academic, and government institutions. They are also a great value, delivering an estimated $463 revenue return on each dollar invested.1 However valuable, they have one major drawback: a given HPC system’s capacity is fixed, while an organization’s demand is highly variable and generally exceeds system capacity. Hyperion found that average demand exceeds capacity by 24% or more in 45% of surveyed HPC centers.2 With such a compelling ROI, unmet demand represents a significant opportunity cost, through reduced productivity, less optimized product designs, and delayed discoveries or time to market. Existing HPC users are beginning to use the cloud to supplement work and scale on demand to unlock their productivity. By accessing HPC in the Cloud, organizations can offer their users faster turnaround versus waiting in a queue, unlocking user productivity. Further, the results achieved through cloud can measurably demonstrate the value of increased access and inform future HPC capacity planning

The Cloud’s ability to offer vast resources on demand is also enabling innovative new approaches to HPC. A recent example involving on Amazon Web Services, Western Digital dramatically accelerated the exploration of a complex design space, accelerating product design decisions and time to market. For global organizations, HPC in the Cloud eases collaboration for teams working on shared data sets and projects. It also provides ways to develop and test new methods and ideas without impacting production usage.

Expanding HPC Access

Existing HPC users aren’t the only ones benefiting from cloud. Entirely new business that depend on HPC can quickly launch without having to design their own data center and hire a full-time administrator, helping them get to market faster. Boom Supersonic is one such example. The company is working to bring back commercial supersonic travel following the demise of the Concorde and requires HPC simulation and modeling to design its future 55-passenger airplane.

Finally, there is a large population of technical workstation users who are becoming constrained by locally-available compute power. Increasingly detailed designs and product requirements are driving substantial increases in simulation time. Many organizations using Microsoft Windows-based workstations lack the capability, desire, or capital required to install a typical Linux HPC cluster. Solutions like Altair PBS Works provide seamless access from workstation to cloud, improving workstation user productivity by shortening simulation turnaround time.

Cutting-Edge Technology

The cloud often offers HPC users the earliest access to new technologies. For example, Google Cloud Platform was first to offer 2nd Generation Intel® Xeon Scalable processors and Intel® Optane™ DC persistent memory. Amazon Web Services offers HPC-oriented C5 and C5n instances based on Intel Xeon Scalable Processors and recently introduced Elastic Fabric Adapter for improved MPI communications.. Microsoft Azure recently launched HC instances targeted for HPC and connected with InfiniBand, also for improved MPI performance. The cloud provides a great option for accessing technologies not available on premise.

If you’ll be attending ISC 2019 in Frankfurt, I encourage you to attend some of our offered tutorials from the leading cloud service providers to see how their HPC services might benefit your organization. I also invite you to stop by the Intel booth Monday and Tuesday to hear panel discussions I will be hosting with major cloud providers and Intel partners who are helping organizations tap into the benefits of HPC in the Cloud. All our talks, panels, and tutorials can be found in this mobile agenda.


1 Hyperion Research 2018 HPC ROI Research Update: Economic Models for Financial ROI And Innovation From HPC Investments. https://www.hpcuserforum.com/ROI/downloads/HyperionResearchPowerPoint.zip
2 Hyperion Research 2018 HPC Multi-Client Study: The Use of Public/External Clouds for HPC Workloads, Trends, and Drivers

Published on Categories High Performance ComputingTags , , , , ,
Bill Magro

About Bill Magro

Bill Magro, Intel Fellow & Chief Technologist, High-Performance Computing, serves as an HPC strategist, provides HPC software requirements into Intel product roadmaps, and leads Intel’s efforts in HPC Solutions, including HPC in the Cloud. He has worked in the field of HPC for 30 years. He joined Intel in 2000 with the acquisition of Kuck & Associates Inc. (KAI), where he served as product and consulting manager for KAI’s parallel computing tools. Prior to KAI, Bill worked at two NSF-funded supercomputing centers, the Cornell Theory Center and NCSA. He has authored numerous articles published in technical and academic journals and holds nine patents. He is the co-chair of the InfiniBand Trade Association Technical Working Group. He holds a bachelor's degree in applied and engineering physics from Cornell University and a Ph.D. in computational physics from the University of Illinois at Urbana-Champaign.