Bridging Advanced Analytics and Artificial Intelligence with BigDL

Co-authored by Radhika Rangarajan, ‎Senior Engineering Program Manager, Big Data Technologies at Intel

Artificial intelligence (AI) plays a central role in today’s smart and connected world and is continuously driving the need for scalable, distributed big data analytics with deep learning capabilities. While interest surges in AI, organizations with solutions based on Big Data and Analytics are looking to leverage their existing investment to drive deeper insight. This is now technically possible thanks to an Intel open source effort called BigDL.

BigDL (Deep Learning) is a new open source project from Intel’s Software and Services Group whose vision was to merge two paths of learning capabilities for those building on Apache Spark on the Intel platform. The two paths include traditional distributed analytics with optimized deep learning, to give organizations more ways to derive insights and build solutions. First, we’ll discuss the paths; then explain what the bridge is and how it’s changing things.

The Big Data path has largely been about storing and accessing massive amounts of data. The attendant technologies distribute the data across many low-cost systems and provide a mechanism to access it a chunk at a time and consolidate the results. Hadoop*, with its distributed file system, and Apache Spark*, which enables fast in-memory processing across a large cluster of systems, are widely applied by data scientists to store and access data for analysis.

Data also drives machine learning. Applications are trained by having them ingest mountains of data to detect patterns based on models created to predict outcomes. But the focus has been on processor-intensive machine learning apps that run on large, specialized systems in the data center.

BigDL (see Figure 1) —a newly released open source project from Intel —is a distributed deep learning framework for Apache Spark that ties distributed data access and processing together with emerging deep learning solutions. It lets developers write deep learning applications as standard Spark programs that run on top of existing Spark or Hadoop clusters to put deep learning workloads more directly in touch with the data they use. BigDL uses the Intel® Math Kernel Library and multi-threaded programming in each Spark task. It can run orders of magnitude faster than out-of-the-box Caffe*, Torch*, or TensorFlow* models on a single-node Intel® Xeon® processor. That’s performance comparable with mainstream GPUs but scalable across many general-purpose systems.

Distributed Deep Learning on Apache Spark and Intel® Xeon processors
BigDL Implementation

So what can you do with BigDL?

  • Leverage existing Spark and Hadoop clusters to run machine learning workloads. In addition to running apps developed with the BigDL framework, BigDL lets you load pre-trained models developed with Caffe or Torch into Spark programs, so you can develop and train in Caffe or Torch and deploy on Spark.
  • Dramatically improve the scalability of machine learning apps by distributing the processing across many more nodes and putting more of the processing where the data is.
  • Add deep learning functions to analytics applications or workflows to enable them to do even more.

It’s these things together that make possible new breakthrough applications. For example, advanced analytics and AI are already enabling faster, more accurate medical diagnoses. Think how we might improve that when we can add machine learnings from millions of medical images. And there are similar opportunities in almost every field to take what’s being achieved with analytics and use machine learning to extend and enhance it.

It’s not just the sharing of technology, BigDL enables the exporting of AI expertise to data scientists now working across thousands of applications in hundreds of fields. And BigDL is the latest example of Intel’s commitment to open source as the way to democratize AI and enable more people to apply it in more applications.

If you’re looking to leverage your business or organization or just want to learn more, visit the BigDL GitHub repository. In addition to the software and documentation, we’re providing a set of tutorials and examples that can help get projects off the ground quickly. And stay tuned, we’ll soon have BigDL information and training incorporated into the Intel® NervanaTM AI Academy.

For all those who wisely committed down the path of building on Apache Spark on the Intel platform, BigDL now extends the possibilities into deep learning, enabling more flexibility when driving insights.