Driving Higher Performance for AI Systems—Today & Tomorrow

Some of the brightest minds in machine learning and deep learning are gathered this week in Barcelona for the annual Neural Information Processing Systems (NIPS) conference. This is the 30th year for the NIPS conference, and the main event was sold out well in advance of the opening.  That says a lot about the importance of this event for machine learning researchers and data scientists – and the increasing industry focus on this field.

Committing to the Evolution of AI

For the first time, Intel is a sponsor of this machine learning and deep learning conference. We are showcasing Intel technologies and initiatives that advance artificial intelligence (AI). All of this should make it clear that Intel is taking AI very seriously. For example, we are working to optimize software frameworks used in AI systems, with a goal of enabling optimal CPU performance and creating a compelling case for more organizations to leverage their substantial existing investments in CPUs towards building and deploying AI systems.

And there is good news on this front. While early CPU implementations within Deep Learning frameworks were primarily used for debugging, Intel recently let the industry know that deep learning framework performance on Intel® architecture has improved by up to 400x[1] with the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), an open source performance library for deep learning. Intel MKL-DNN provides a library of DNN performance primitives optimized for Intel architectures. This set of highly optimized building blocks accelerates compute-intensive parts of deep learning applications within DNN frameworks such as Nervana neon (coming soon), Caffe*, Tensorflow* (coming soon), Theano*, and Torch*. This huge leap forward is just the beginning of Intel’s progress towards our goal of delivering up to a 100x reduction in the time to train deep learning models over the next three years compared to today’s GPU based solutions. For details, see our Intel AI Day news release.

Optimizing AI for Performance

In another AI software initiative, Intel is unifying the path for framework optimizations for higher performance across hardware platforms. These are among the benefits of our Intel® Nervana™ Graph, which brings advanced capabilities to deep learning frameworks. This computational and execution graph for neural networks, which we will be discussing this week at NIPS, is designed for automatic application of optimizations across multiple hardware targets. Optimizations (current and future) include efficient buffer allocations, optimizations specific to training versus inference, efficient scaling across multiple nodes, efficient partitioning of sub-graphs, and hardware specific compounding of ops. The Intel Nervana Graph will make life a lot easier for hardware developers and data scientists alike. Hardware developers benefit from defining their optimizations once for use with the panoply of DL frameworks, while data scientists and ML researchers benefit from getting state of the art performance while continuing to use their framework of choice.

On the hardware front, Intel plans to usher in the industry’s most comprehensive portfolio for AI—the Intel Nervana platform. This portfolio includes the Intel Nervana Engine (codenamed Lake Crest), a fast ASIC that will work seamlessly together with the Intel® Xeon® processor family. Intel expects the Lake Crest roadmap to deliver substantially better performance for deep learning workloads compared to competitive roadmaps. I think it’s safe to say that the combination of Lake Crest and the Intel Xeon processor family will take deep learning and AI solutions to an all-new level.

Intel is committed to democratizing AI knowledge and tools to benefit the widest audience. To that end, Intel recently launched the Intel Nervana AI Academy. This portal provides a single point of access to frameworks, libraries, and resources that help data scientists and developers accelerate the development of solutions optimized for top performance on Intel hardware. Intel has launched several initiatives including a Kaggle competition to improve cervical cancer screening and an upcoming Deep Learning course with Coursera.

Help Shape the Future

If you’re attending the NIPS conference this week, I hope you will bring your questions about our AI initiatives to the Intel booth. If you’re interested in Deep Learning and interested in joining our team, we’d love to hear from you, in person or online. You can check out the current deep learning job postings on our career opportunities site, and you can always learn more about the Intel AI portfolio at www.intel.com/ai.

[1] Configuration details: BASELINE: Caffe Out Of the Box, Intel® Xeon Phi™ processor 7250 (68 Cores, 1.4 GHz, 16GB MCDRAM: cache mode), 96GB memory,  Centos 7.2 based on Red Hat* Enterprise Linux 7.2, BVLC-Caffe: https://github.com/BVLC/caffe, with OpenBLAS, Relative performance 1.0

NEW: Caffe: Intel® Xeon Phi™ processor 7250 (68 Cores, 1.4 GHz, 16GB MCDRAM: cache mode), 96GB memory,  Centos 7.2 based on Red Hat* Enterprise Linux 7.2, Intel® Caffe: : https://github.com/intel/caffe based on BVLC Caffe as of Jul 16, 2016, MKL GOLD UPDATE1, Relative performance up to 400x

AlexNet used for both configuration as per https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf, Batch Size: 256

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit:  http://www.intel.com/performance  Source: Intel measured as of November 2016