Why in-database machine learning? By 2021, the average Internet user will generate about 2 GB of data per day.[i] While that’s about twice what the average user currently produces, it’s nothing compared with the volume that machine-to-machine data will produce. For example, each self-driving vehicle will produce about 5,500 GB—5.5 TB—of data per day.[ii] A connected airplane will produce 40 TB of data per day.[iii] And an average smart factory could produce as much 1,000 TB—1 PB—of data per day in just a couple of years.3
But, to be honest, all that data isn’t very interesting on its own. What’s valuable are the insights that that it can provide. And to get insights from data at the scale that is beginning to emerge, you need predictive analytics tools like machine learning woven into your business processes.
Like other forms of predictive analytics, machine learning predicts the likelihood of future events through statistical relationships in historical data. What sets machine learning apart from other forms of analytics is that machine learning enables your business to develop applications that do not require explicit coding. Machine learning models can capture important decision logic and enable your applications to adapt; as models retrain with new data, the decision logic in your applications automatically updates and becomes more current. Put another way, predictions made by your applications can become more accurate over time without having to rewrite code.
How Machine Learning Fits In
Until now, actually using machine learning in business processes could be challenging. Modeling could require scarce data-science talent to produce. The model deployment and management pipeline could be difficult to administer. Most challenging was performance: many machine learning models simply couldn’t run fast enough to keep up with real-time business needs. This is where the decade-plus collaboration between Intel and SAP can pay off for businesses.
The combination of in-database machine learning on the SAP HANA* platform and Intel® Xeon® Scalable processors addresses these challenges. Running machine learning algorithms in the database eliminates latency and other delays that can occur when running machine learning in a different location from the data. And as more applications get deployed around data stores, the data generated from those applications become more valuable as a whole.
Beyond running locally with your data, SAP HANA in-database machine learning also encompasses other features like the SAP Automated Predictive Library (APL)* and SAP Predictive Analysis Library (PAL)* to help non-specialists build effective machine learning models and to help data scientists produce models optimized to run in the database. SAP HANA in-database machine learning can also effectively make use of streaming data and data spread across multiple database nodes—in addition to open-source models written in R* and TensorFlow*.
Intel Processors Underpin Machine Learning and Artificial Intelligence
What does Intel contribute to SAP HANA in-database machine learning? Intel® Xeon® Platinum processors, part of the Intel Xeon processor Scalable family, provide more memory, more cores, and more threads than previous generations of Intel Xeon processors. All of these improvements result in faster performance for machine learning. Intel Xeon Platinum processors, with support for up to 1.5 TB of memory, 28 cores, and 56 threads per socket, provide the hardware for faster model training and faster model scoring in production.[iv]
Intel Xeon Platinum processors also provide features optimized for computationally demanding SAP HANA workloads like machine learning. One important feature is Intel® Advanced Vector Extensions 512 (Intel® AVX-512), a set of new instructions that can further accelerate machine learning performance. Intel AVX-512 enables processing twice the number of data elements compared to Intel AVX and Intel AVX2.
Another feature is Intel® Ultra Path Interconnect (Intel® UPI), the successor to Intel® QuickPath Interconnect (Intel® QPI). Intel UPI has up to three channels to enable connecting Intel Xeon processors across a high-speed, low-latency path. This feature increases scalability up to eight sockets and improves bandwidth for input/output (I/O)-intensive workloads.
Intel also brings to the table the Intel® Nervana™ Neural Network Processor (NNP), a new offering tailored specifically for artificial intelligence (AI) workloads. The Intel Nervana NNP is a purpose-built class of hardware for deep learning. The goal of the architecture is to provide the needed flexibility to support deep learning primitives while also freeing the Intel Nervana NNP from the architectural limitations faced by existing hardware, which wasn’t explicitly designed for AI.
You can read more about these technologies and the synergies between Intel Xeon Scalable processors and SAP HANA in-database machine learning in our “SAP HANA In-Database Machine Learning on Intel Hardware for Production Deployment” <<link to paper>> solution brief. And don’t forget to follow me and my growing #TechTim community on Twitter: @TimIntel.
[i] Cisco. “By the numbers: Projecting the future of digital transformation (2016–2021).” 2017. cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html.
[ii] Datafloq. “Self-driving Cars Will Create 2 Petabytes Of Data, What Are The Big Data Opportunities For The Car Industry?” https://datafloq.com/read/self-driving-cars-create-2-petabytes-data-annually/172.
[iii] Cisco. “Cisco Global Cloud Index: Forecast and Methodology, 2015–2020.” 2016. cisco.com/c/dam/en/us/solutions/collateral/service-provider/global-cloud-index-gci/white-paper-c11-738085.pdf
[iv] Select Intel® Xeon® Platinum processor stock keeping units (SKUs) provide up to 1.5 TB memory capacity. All others provide 768 GB.