Monetizing AI: How to Get Ready for ‘Inference at Scale’

By 2020, our conservative prediction is that the ratio of training deep learning models to applying them in the real world (inference) will shift to potentially well over to 1:51, versus 1:1 where we see it today. This impending shift to ‘inference at scale’ marks deep learning’s coming of age moment.

Why? Because the moment a model is deployed is the moment it allows enterprises to monetize AI, launching new applications or services by applying their trained models to new data sets.

With this in mind, it makes sense for enterprises to start preparing for inference at scale now, by selecting the hardware and software infrastructure best-suited for their AI applications.

Infrastructure considerations

AI is a complicated mix of getting raw data ready to use, creating and then fine-tuning models, and deploying solutions at scale in the real world, where they must continually be refined.

Most AI today happens in data centers, or their clouds. As billions of devices get connected to the internet and our need for real-time intelligence grows, more AI inference will also move to the edge of the network to avoid the need for data transfer to the cloud.

This means that architecting infrastructure for AI requires a brand-new approach, including creating flexible data centers capable of pooling huge resources of on-demand compute, storage and connectivity. However, memory, power and data movement can all create bottlenecks that can drive down utilization and create more cost. Here’s where Intel can help.

Intel’s hardware and storage innovation

Enhanced specifically to run performance hungry AI applications alongside the data center and cloud applications they already run, 2nd generation Intel® Xeon® Platinum 8200 series processors with new Intel® Deep Learning Boost improve inference throughput by up to 14x compared to the previous generation technology2, and Intel® Xeon® Platinum 9200 series products improve inference performance on image classification by u to an astonishing 30x compared to competing processors3.  Not only is Intel optimizing deep learning performance within generations by partnering with popular frameworks like TensorFlow*, MXNet*, PyTorch* and PaddlePaddle*, with each subsequent generation, Intel will deliver enhancements to Intel® Deep Learning Boost, like the new set of embedded accelerators called Vector Neural Network Instructions (VNNI), available in this latest generation processor, which accomplish in a single instruction what formerly required three, to speed up dense computations characteristic of convolutional neural networks (CNNs) and deep neural networks (DNNs).

Built into the 2nd Generation Intel Xeon Scalable processor, support for Intel® Optane™ DC persistent memory enables more memory closer to the CPU, allowing data to be sustained even throughout power cycles or system maintenance. Lower latency also allows enterprises to activate larger working data sets in-memory, meaning it’s possible to extract more value from significantly larger data sets more cost-effectively.

Intel® Optane™ Solid State Drives (SSDs) help enterprises break through storage bottlenecks by allowing data centers to deploy bigger, more affordable data sets, accelerate applications and take advantage of the enterprise-level insights that come from working with larger memory pools. This means Intel® Optane™ technology can add value to the training and inference aspects of deep learning. At the training stage, a larger data set and optimized batch training means AI solutions can get smarter, faster. At the inference stage, larger data sets enable the coverage of what is being inferred to be expanded in both real-time and batch inference workloads.

Taking AI to the edge, closer to where data is generated and consumed

Intel also offers hardware and software tools to help enterprises extend AI to the edge—on network edge devices, on-premise servers and gateways, and smart and connected endpoints. For example, Intel® Movidius™ vision processing units enable deep neural network (DNN) inferencing directly on low-power devices like cameras and drones, and the OpenVINO™ software toolkit makes it easier to deploy computer vision applications across multiple Intel® architectures, from devices to clouds.

AI’s impact on technology and society is still in its infancy, but we already know that inference is where enterprises will derive real business value. Those who lay the infrastructure groundwork now will be best positioned to take full advantage of inference at scale.

To read more about how Intel can support you in getting ready for inference at scale, as well as stories from customers already on the path to AI readiness, download our eGuide: Exploring the Path from AI Theory to Real Business Value.


Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks

1 https://www.nextplatform.com/2018/10/18/deep-learningon-is-coming-of-age/

2 14x inference throughput improvement on Intel® Xeon® Platinum 8280 processor with Intel® DL Boost: Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2 GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep Learning Framework: Intel® Optimization for Caffe version: 1.1.3 (commit hash: 7010334f159da247db3fe3 a9d96a3116ca06b09a), ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, synthetic Data, 4 instance/2 socket, Datatype: INT8 vs. Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7. x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).

Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpu power frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690 af267158b82b150b5c. Inference measured with “caffe time -- forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, synthetic dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl-l“.

3 30x inference throughput improvement on Intel® Xeon® Platinum 9282 processor with Intel® DL Boost: Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282(56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d9 4195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer synthetic Data: 3x224x224, 56 instance/2 socket, Datatype: INT8 vs. Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7. x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb /s, 25nm, MLC).

Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=5 6, CPU Freq set with cpu power frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690 af267158b82b150b5c. Inference measured with “caffe time -- forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, synthetic dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice Revision #20110804

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at Intel.com

Intel, the Intel logo, Xeon, Optane, Movidius, and OpenVINO are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© Intel Corporation