Artificial Intelligence (AI) workloads are a relatively new challenge for enterprises. Much of the Deep Learning focus in AI has been on training—teaching neural networks to create the best model for solving a particular problem by processing massive data sets. As teams succeed in training effective models, inferencing—putting those trained models to work inside actual products—will become increasingly important.
Inference puts different demands on hardware than training, and has different business requirements as well. Some end-users do their inference in a datacenter, but many need to deploy at the network edge - on a dedicated Server, a PC, a mobile device, perhaps even an embedded system. The rapid progress of AI over the last few years makes hardware selection even more challenging. “What if my product needs to use an entirely new type of AI next year?"
Product managers and architects need hardware with high throughput, low latency - and a high degree of flexibility, as insurance against new developments in AI. Analyst firm McKinsey noted, “inference at the edge will become increasingly common for applications where latency in the order of microseconds is mission critical.” We are entering an era when it is just as crucial for a system to react to real-time input and make quick decisions as it was for that system to be trained effectively. General purpose CPUs are today’s primary tool for inference due to their ability to provide low latency and high throughput, while also handling other kinds of workloads. A report from Morningstar made similar conclusions, and last year Taboola told Reuters that running inference on their already established servers was proving more cost-effective than shuffling data between different systems.
To empower enterprises to deploy efficient AI inferencing algorithms on a CPU-based system, Intel has created new Intel® Select Solutions for AI Inferencing that leverage the low-latency, high-performance features of 2nd Generation Intel® Xeon® Scalable processors. Sugon will be one of the first original equipment manufacturers (OEMs) to offer Intel Select Solutions for AI Inferencing to their customers.
The new solution joins a number of other existing Intel Xeon Scalable processor based solutions to offer higher performance and improved capabilities. Inspur, for example, will be offering an enhanced version of the Intel® Selection Solution for BigDL on Apache Spark* utilizing 2nd Generation Intel Xeon Scalable processors. Additionally, the Intel® Select Solution for HPC & AI Converged Clusters expands the scope of HPC workloads to help organizations use AI to accelerate results on a common, flexible system that minimizes storage repetition and maximizes infrastructure flexibility and utilization.
Upgraded AI Tools
One of the most critical features in the new 2nd Generation Intel Xeon Scalable processors is Intel® Deep Learning Boost (Intel® DL Boost). Intel DL Boost accelerates DL inference with new Vector Neural Network Instructions (VNNI), performing in one instruction inferencing calculations that previously took multiple instructions. The performance gains are significant. It’s as if our hardware engineers built a moving van that could double as a drag racer. More concretely, a 3.7x faster images/second result was achieved with inferencing solutions optimized using Intel DL Boost technology.1 Symbolically, it’s also a big change: deep neural networks are now a “normal” part of the software engineering toolkit, and Intel is marking that transition by supporting it in the instruction set of a mainstream CPU.
Intel Select Solutions for AI Inferencing uses the Intel® Distribution of OpenVINO™ toolkit, a developer suite that accelerates high-performance deep learning (DL) inference deployments. The toolkit quantizes DL models, a process in which the toolkit transforms models from using large, high-precision 32-bit floating-point numbers (vital to training) to using 8-bit integers. Swapping out floating-point numbers for integers sounds counterintuitive (I’m old enough to remember just how much better graphics got as graphics went from 4-bit to 32-bit), but it leads to significantly faster AI inference with almost identical accuracy.2 (I trusted the science, but it was still a “wow” moment the first time my team validated this with our own model.) The OpenVINO toolkit can convert and execute models built in a variety of frameworks, including TensorFlow*, MXNet, or any framework that supports ONNX*. (Yes, that includes you, PyTorch user!)
The new workload-optimized solutions also take full advantage of Intel® SSDs and 25 GbE Intel® Ethernet to deliver generation over generation performance increases.
Ready to Deploy
Getting your models to production quality is hard enough. When you are ready to move to production, we want to give you a hardware and software combination that will give you an easy way to let your product shine, whether in the datacenter or at the edge.
Intel Select Solutions are composed of verified Intel® architecture building blocks that enterprises can innovate on and take to market. When organizations choose Intel Select Solutions for AI Inferencing, they get optimized, pre-tuned, and tested configurations that are proven to scale on general-purpose hardware and can be deployed quickly and easily.
For more information, and to see the full range of Intel Select Solutions, visit intel.com/selectsolutions. For more about Intel’s advancements in AI, visit intel.com/ai. For more information on how to accelerate your data insights and building your infrastructure, visit intel.com/yourdata.
1 The solution was tested with TensorFlow/ ResNet50 for Inference (comparing Int8 and FP32 tests) on 03-07-2019 with the following hardware and software configuration:
Base configuration: 1 Node, 2x Intel® Xeon® Gold 6248; 1x Intel® Server Board S2600WFT; Total Memory 192 GB, 12 slots/16 GB/2666 MT/s DDR4 RDIMM; HyperThreading: Enable; Turbo: Enable; Storage(boot): Intel® SSD DC P4101; Storage(capacity): At least 2 TB Intel® SSD DC P4610 PCIe NVMe; OS/Software: CentOS Linux release 7.6.1810 (Core) with Kernel 3.10.0957.el7.x86_64; Framework version: intelaipg/intel-optimizedtensorflow:PR25765-devel-mkl; Dataset: Synthetic from benchmark tool; Model topology: ResNet 50 v1; Batch Size: 80