Monetizing AI: How to Get Ready for ‘Inference at Scale’

monetizing ai image

By 2020, the ratio of training deep learning models to inference within enterprises will rapidly shift to 1:51 , versus 1:1 where we see it today. Intel refers to this impending shift as ‘inference at scale’ – and it marks deep learning’s coming of age moment.

Why? Because whereas training focuses on feeding and inputting data into a model, inference is how applications run; it allows enterprises to monetize AI, launching new applications or services by applying their trained models to new data sets.

With this in mind, it makes sense for enterprises to start preparing for inference at scale now, by selecting the hardware and software infrastructure best-suited for their AI applications.

Infrastructure considerations

AI is a complicated mix of getting raw data ready to use, creating and then fine-tuning models, and deploying solutions at scale in the real world, where they must continually be refined.

Most AI today happens in data centers, or the cloud. As billions of devices get connected to the internet and our need for real-time intelligence grows, more AI inference will also move to the edge of the network to avoid the need for data transfer to the cloud.

This means that architecting infrastructure for AI requires a brand-new approach, including creating flexible data centers capable of pooling huge resources of on-demand compute, storage and connectivity. However, memory, power and data movement can all create bottlenecks that can drive down utilization and create more cost. Here’s where Intel can help.

Intel’s hardware and storage innovation

Enhanced specifically to run high-performance AI applications alongside the data center and cloud applications they already run, 2nd generation Intel® Xeon® Scalable processors improve performance up to 277X for inference compared to the processor’s initial launch in July 20172. Not only is Intel optimizing deep learning performance within generations by partnering with popular frameworks like TensorFlow* and MXNet*, with each subsequent generation, Intel will deliver enhancements to Intel® Deep Learning Boost, beginning with a new set of embedded accelerators called Vector Neural Network Instructions (VNNI), which accomplish in a single instruction what formerly required three, to speed up dense computations characteristic of convolutional neural networks (CNNs) and deep neural networks (DNNs).

Built in to the 2nd Generation Intel Xeon Scalable processor, support for Intel® Optane™ DC persistent memory enables more memory closer to the CPU, allowing data to be sustained even throughout power cycles or system maintenance. Lower latency also allows enterprises to activate larger working data sets in-memory, meaning it’s possible to extract more value from significantly larger data sets more cost-effectively.

Intel® OptaneSolid State Drives (SSDs) help enterprises break through storage bottlenecks by allowing data centers to deploy bigger, more affordable data sets, accelerate applications and take advantage of the enterprise-level insights that come from working with larger memory pools. This means Intell® Optane™ technology can add value to the training and inference aspects of deep learning. At the training stage, a larger data set means AI solutions can get smarter, faster. At the inference stage, larger data sets enable the coverage of what is being inferred to be expanded.

Taking AI to the edge with Intel

Intel also offers hardware and software tools to help enterprises extend AI to the edge—on network edge devices, on-premise servers and gateways, and smart and connected endpoints. For example, Intel® Movidius™ vision processing units enable deep neural network (DNN) inferencing directly on low-power devices like cameras and drones, and the OpenVINO™ software toolkit makes it easier to deploy computer vision applications across multiple Intel® architectures, from devices to clouds.

AI’s impact on technology and society is still in its infancy, but we already know that inference is where enterprises will derive real business value. Those who lay the infrastructure groundwork now will be best positioned to take full advantage of inference at scale.

To read more about how Intel can support you in getting ready for inference at scale, as well as stories from customers already on the path to AI readiness, download our eGuide: Exploring the Path from AI Theory to Real Business Value.

For more information,



Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit

[2] INFERENCE using FP32 Batch Size Caffe* GoogLeNet* v1 128 AlexNet* 256. Configurations for inference throughput: Platform: 2-socket Intel® Xeon® Platinum 8180 processor @ 2.50 GHz/28 cores HT ON; turbo: ON, total memory 376.28 GB (12 slots/32 GB/2666 MHz), four instances of the framework, CentOS* Linux*-7.3.1611-Core, SSD sda RS3WC080 HDD 744.1 GB, sdb RS3WC080 HDD 1.5 TB, sdc RS3WC080 HDD 5.5 TB, deep learning framework Caffe* version: a3d5b022fe026e9092fc7abc7654b1162ab9940d; topology: GoogLeNet* v1 BIOS*: SE5C620.86B.00.01.0004.071220170215 MKLDNN: version: 464c268e544bae26f9b85a2acb9122c766a4c396; NoDataLayer. Measured: 1449 imgs/sec vs. Platform: 2S Intel® Xeon® processor E5-2699 v3 @ 2.30 GHz (18 cores), HT enabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 64 GB DDR4-2133 ECC RAM. BIOS: SE5C610.86B.01.01.0024.021320181901, CentOS Linux-7.5.1804 (Core) kernel 3.10.0-862.3.2.el7.x86_64, SSD sdb INTEL SSDSC2BW24 SSD 223.6 GB. Framework: BVLC-Caffe:, inference and training measured with “Caffe time” command. For “ConvNet” topologies, dummy data set was used. For other topologies, data was stored on local storage and cached in memory before training. BVLC Caffe (, revision 2a1c552b66f026c7508d390b526f2495ed3be594.

Performance results are based on testing as of June 7, 2018 (Intel Xeon Platinum 8180 processor) and June 15, 2018 (2S Intel Xeon processor E5-2699) and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice Revision #20110804