Accelerating Deep Learning Workloads

At the International Supercomputing conference this week in Frankfurt, there’s a lot of excitement about new technologies that are fueling the momentum of artificial intelligence and deep learning, which is now used across a wide range of industries. Intel offers a wide portfolio for all AI segments, including our upcoming Intel® Xeon® Processor Scalable Family and Intel field programmable gate arrays (FPGA).  This excitement extends to the upcoming Intel® Xeon Phi™ processor, code-named Knights Mill, which will take deep learning systems to a new level.

This new extension of the Intel Xeon Phi family, expected to be in production in the 4th quarter of 2017, is specifically optimized for deep learning training. The processor is designed to meet the unique needs of data scientists, engineers, and others focused on the application of machine learning technologies. In particular, Knights Mill is designed to reduce the time to train deep learning models by taking advantage of low precision computing.

So why does low precision matter? In simple terms, data scientists need hardware that can accelerate convergence when training models. In the past, deep learning models could take days or weeks to converge on one iteration. That makes it very difficult for data scientists to perform research in an appropriate amount of time.

Today, current hardware can reduce this training time down to a matter of hours by using lower-precision computing — which equates to faster computing. As long as the hardware meets the accuracy requirements of the deep learning framework, what is important is how fast the hardware can train the model. This is why lower precision is usable for deep learning workloads and is the preferred computing method in comparison to HPC, which typically requires single or double-precision performance.

So what’s the difference between Knights Mill and the current Intel Xeon Phi processor, formerly code-named Knights Landing? This is a question we hear often from customers who are focused on HPC, AI, and machine learning.

Knights Mill uses the same overarching architecture and package as Knights Landing. Both CPUs are a second-generation Intel Xeon Phi and use the same platform. The difference is Knights Mill uses different instruction sets to improve lower-precision performance at the expense of the double-precision performance that is important for many traditional HPC workloads. This means Knights Mill is targeted at deep learning workloads, while Knights Landing is more suitable for HPC workloads and other workloads that require higher precision.

As for those different instruction sets, they are called Quad Fused Multiply Add (QFMA) and Quad Virtual Neural Network Instruction (QVNNI). QFMA is a process that doubles the amount of single precision performance that Knights Mill can deliver over Knights Landing. QVNNI is a process designed to lower the precision even further while still meeting the accuracy requirements of deep learning frameworks. The net effect of doubling single precision performance and reducing precision further enables Knights Mill to deliver significantly more performance for deep learning workloads, in comparison to Knights Landing. There are also frequency, power, and efficiency enhancements as well, which contribute to this performance increase, but the instruction set changes offer the largest performance increase.

To take a step back, the Knights Mill processor isn’t only about accelerating deep learning workloads. It’s about gaining new processing capabilities within the context of your existing Intel-based environment. Intel Xeon Phi processor platforms are binary-compatible with Intel® Xeon® processors. Almost all workloads that run on Intel Xeon processors will run on Intel Xeon Phi processors. This makes it easier to share your software investments across Intel platforms.

At another level, Intel is unifying the way forward for deep learning practitioners to use deep learning frameworks across hardware platforms. These are among the benefits of Intel® Nervana™ Graph, which brings advanced capabilities to deep learning frameworks. This computational and execution graph for neural networks allows developers to automatically apply optimizations across multiple hardware targets allowing users to share their software investments across different Intel platforms.

If you’re at the ISC conference this week, you can learn more about Knights Mill and Intel’s AI portfolio in the Intel booth. And for a deeper dive into the processors and technologies that will power new artificial intelligence and machine learning solutions, visit

Published on Categories High Performance ComputingTags , , , , , , ,
Barry Davis

About Barry Davis

Barry Davis has over 28 years of experience in the computing and telecommunications’ industries. While at Intel Corporation, Mr. Davis built multiple businesses from the ground up creating Intel’s I/O Storage and Wireless Networking groups. Mr. Davis was one of the original people behind the worldwide success of Wi-Fi, including the introduction of Intel CentrinoTM Mobile PCs. He has numerous industry awards and holds 11 U.S. and 2 international patents. Mr. Davis has been at the forefront of Intel’s recent fabric plans and was the lead on creating the strategies, plans, and products that have resulted in the Intel® Omni-Path Architecture. Barry is currently General Manager of the Accelerated Workload Group that is part of the Enterprise & Government Group (E&G) in Intel’s Data Center Group. He holds a B.S. Electrical Engineering from Lehigh University.