The artificial intelligence (AI) taxonomy spans capabilities that enable systems to sense, reason, act and adapt. These capabilities stem from technologies for machine learning, including deep learning and classic machine learning, as well as the technologies for reasoning systems. In this post, I will focus on the hardware technologies for machine learning (ML). This is the area of AI that enables algorithms to learn from their experiences and improve their performance over time.
The machine learning algorithms that are at the heart of many AI solutions bring a unique set of technical challenges. For starters, these algorithms have high arithmetic density. Training a model with a billion parameters (moderately complex network) can take days unless properly optimized and scaled. Further, this process often needs to be repeated to experiment with different topologies to reach the desired level of inferencing accuracy. This process requires a huge amount of computational power.
And then there is the data to think about. When you train a model, performance scales with the amount of data you feed into the model. For example, the performance of a speech recognition algorithm might improve greatly—to near human-level performance—if it is fed enough data (up to a point). Of course, this too requires significant amount of memory and computing capacity.
Until recently, these challenges were a major blockade in the road to neural networks. While the mathematical concepts behind neural networks have been around for decades, until now we have lacked the combination of technologies required to accelerate the adoption of deep learning. That combination has two key components: enough compute to build sufficiently expressive networks, and enough data to train generalizable networks.
Today, thanks to Moore’s Law, the rapid digitization of content around us, and the accelerating pace of algorithmic innovation, we have overcome both of these challenges. We now have the compute and data we need for neural networks. A variety of processing platforms exist for executing deep learning workloads at the speeds required for AI solutions, even as the datasets consumed by the models grow larger and larger.
Solutions for Every Business Need
With recent acquisitions, Intel now offers four platforms for AI solutions. People sometimes ask me why we would need four platforms for AI. The answer is that different AI use cases have different platform requirements:
Intel® Xeon® processors
With machine learning and deep learning solutions, most of the processing time involves data management, such as bringing data into the system and cleaning it up. The compute time is a smaller part of the problem. This mix of needs—heavy on management, less so on compute—is best done on the Intel Xeon processor platform, the world’s most widely deployed machine learning platform. Intel Xeon processors are optimized for a wide variety of data center workloads, enabling flexible data center infrastructure.
Intel® Xeon Phi™ processors
As you move forward into more demanding machine learning algorithms where models are built and trained and then retrained over and over, you need a different platform balance that enables a shorter time to train. Intel Xeon Phi processors are a great platform choice for these higher-performance general-purpose machine learning solutions. They are optimized for HPC and scale-out, highly parallel, memory-intensive applications. With the integrated Intel® Omni-Path Fabric, these processors offer direct access to up to 400 GB of memory with no PCIe performance lag. They enable near linear scaling efficiency, resulting in lower time to train.
Future generation processors
When you advance into the deep learning subset of machine learning, your workloads will have different requirements. For the fastest performance, you need a platform that is optimized for training deep learning algorithms that involve massive amounts of data. Our upcoming Intel® NervanaTM platform (codename Lake Crest) has been developed specifically for this use case. This platform will deliver the first instance of the Nervana Engine coupled with the Intel Xeon processor. With its unprecedented compute density and high-bandwidth interconnect, this new platform will offer best-in-class neural network performance. We’re talking about an order of magnitude more of raw computing power compared to today’s state-of-the-art GPUs.
Intel® Xeon® processors + FPGA
Once you have trained your models, you need a platform that can very efficiently inference using these trained neural networks. For example, you might have an application that classifies images based on its ability to recognize things in the images—such as different types of animals. The combination of Intel Xeon processors + FPGA (field-programmable gate array) accelerators is uniquely suited for these sorts of inference workloads. It’s a customizable and programmable platform that offers low latency and flexible precision with high performance-per-watt for machine learning inference.
Just the Beginning of AI
Here’s the bottom line-- If the competitiveness of your organization depends on your ability to leverage a wide range of AI solutions, you need more than a one-size-fits-all processing platform. You need all four of these Intel platforms for an end-to-end solution.
Let’s close with a look to the future. While we have clearly made huge strides in advancing AI, we still have a long way to go. We need to ramp up the performance of machine learning algorithms to unprecedented levels. At Intel, we are firmly committed to this goal. Over the next three years, Intel aims to reduce the time required to train deep models by 100x in comparison to today’s GPU solutions. This goal was spelled out in a recent news release in which Intel unveiled its AI strategy.
In the meantime, while Intel pushes onward and upward, you can explore our Artificial Intelligence site for more information on the technologies outlined here.