Drive AI Innovation with Your Evolved HPC Infrastructure

At Intel Labs, we are excited and inspired by the convergence of HPC with emerging workloads. One of the ripest areas of exploration and innovation for us is that of artificial intelligence (AI).

AI workloads today are more compute-hungry than pretty much anything else, and they’re one of the key drivers of the demand for exascale. This means AI is forming a symbiotic relationship with HPC, where the two are driving each other. AI brings massive data to the table (which has been less of a concern for traditional, simulation-dominated HPC), while HPC delivers the high compute performance needed to manipulate it.

Clearly there’s a lot of opportunity here, but there are new challenges as well, beyond those of the massive scale of both data and compute. A third key area to think about is that AI is bringing to HPC a new class of experts—domain specialists who know their data science very well, but are less comfortable with programming. They’re looking for unprecedented levels of performance, but their willingness to program for it is often low or even non-existent. Providing these new users with the performance they need without compromising their productivity can be a big challenge.

Considerations for building a converged infrastructure

We’ve been working hard on solving this issue, starting with the premise that improving productivity must be an end-to-end exercise. As the majority of time spent on any project is in preparing the data before it even gets to compute, just focusing on improving a kernel for a deep learning subset, for example, misses much of the bigger picture.

A critical part of the solution is data-level parallelism—the ability to process millions (even billions) of unique data points simultaneously. However, achieving this today can be a rather convoluted process. Typically, you’d start by developing a ‘model’ while still prototyping different algorithms in a language like Python* or R*. Then, when you come to deploy it, you’d need to dump this code and re-develop it in something like CMPI*, which enables parallelism on a high-performance infrastructure. At Intel Labs, we’re working to bring parallelism to Python, R, and new languages like Julia*, so that the whole process, from prototyping to performance deployment, can be run within one high-productivity software infrastructure.

Meanwhile, parallelism capabilities are evolving. We’re moving from traditional one-dimensional vectors to two-dimensional matrix processors, which increase the data volume that can go through the processor at one time. The next step is to make this 2D model more efficient by bringing in fine-grain configuration using field programmable gate arrays (FPGAs). There’s a lot going on in this space at the moment, both at Intel and across the industry.

Looking beyond the processor itself, improvements can be made at an architectural level too. Native data structures like arrays are today not supported across distributed infrastructures, and they don’t enable you to exploit the parallelism on offer from multi-core systems. We’re also exploring ways to support these large arrays on the high-performance compute infrastructure you need for demanding AI workloads. This can be done by distributing the arrays across the infrastructure and taking advantage of parallelism at every level—not just multiple nodes, but multiple cores inside each node—as well as support for caches and threads. This enables the programmer to write simple code that he or she is comfortable with while the rest of the software toolchain helps enable and feed the parallelism they need.

Another area that will become more important as datasets grow and AI becomes more complex is memory. To date, memory hasn’t caused massive issues for HPC users, as the volumes of data in use are relatively small and can fit into existing memory capacity. However, let’s imagine that you are running a deep learning algorithm for image recognition, and rather than looking for 1K objects (for example, identifying a cat in a photo), you now want to analyze each image on a pixel level. Overnight, the number of parameters you’re working with—and so the volume of data you are processing—skyrockets. Having a clear strategy in place for ramping up your memory capabilities in line with compute, fabric and software is therefore crucial.

Your next steps to data-centric productivity

The journey to efficient, productive AI/HPC convergence is an important one for organizations wishing to remain relevant and competitive in the data-centric world. While it may take time, and will undoubtedly encounter obstacles along the way, it will be worthwhile. Luckily the two disciplines can be complementary and have the potential to add greater value when combined than they could do separately.

And you can start your journey today, using your existing Intel® Xeon® Scalable processor-based infrastructure, or even your cloud environment, as your jumping-off point. Learn more by reading the solution brief: Optimizing HPC Architecture for AI Convergence, or hear more about my work at Intel Labs on the convergence of AI and HPC.