Big Data Analytics Accelerates Insight and Innovation, Shaping our World

Throughout human history, insights and innovation have fueled scientific discovery, economic growth, and social progress. That process is accelerating like never before, thanks to big data analytics—a new field that is truly shaping our world.

Data scientists use big data analytics solutions to efficiently capture, process, analyze, and store vast amounts of data of all types. Together with Intel® architecture platforms, open-standards-based software contributions from Intel, like BigDL, support the most ambitious analytics-driven initiatives.

Intel is a leading upstream contributor to the Apache* Spark* project, the leading open source software engine for large-scale data processing. From June 5-7, 2017, more than 3,000 developers, engineers, data scientists, researchers, and business professionals engaged in learning and networking at the Spark Summit in San Francisco, the world’s largest event for the Spark community. I was honored to be an invited keynote speaker where I expounded on Unleashing Data Intelligence with Intel and Apache Spark.

Michael Greene keynote at Spark Summit West 2017 discussing big data analytics
Michael Greene keynote at Spark Summit West 2017

Unleashing BigDL at Spark Summit

During the keynote, I discussed Intel® software innovations that further accelerate big data analytics and the pace of insight and innovation, such as our BigDL open source distributed deep learning framework (open sourced on Dec 30, 2016). The BigDL project was initiated in 2015 by our Spark developers who saw an emerging trend in the data center for Deep Learning (DL) workloads to conduct both training and inference operations. With the Intel® Xeon® processor as the incumbent in the data center powered by Apache Spark as the prevalent big data platform, we identified the lack of good deep learning capabilities in Apache Spark. Our team quickly forged forward and developed a DL library on top of Apache Spark with feature parity with all popular DL frameworks and high single node Xeon performance leveraging Intel® Math Kernel Library. This open source project has received large community support to-date and increasing cloud & enterprise support with wide adoption among top CSPs like AWS*, Azure*, ALiCloud*, Databricks* and Enterprises such as Cloudera*, and Cray* to name a few.

I highlighted some of BigDL’s newly released features: Python language support delivers on one of the most requested features by the BigDL user community; notebook integration using systems such as Jupyter notebooks distributed across the cluster combines Python libraries, Spark SQL and DataFrames, MLlib, deep learning models in BigDL, and interactive visualization tools; and TensorBoard support, which helps data scientists visualize and understand the behavior of BigDL programs. I also previewed some of our plans for expanding the BigDL ecosystem. For example, our Free Compute for BigDL program will make infrastructure available for researchers, data scientists and deep-learning explorers who are ready to scale-out deep learning algorithms on Apache Spark.

New Optimized Analytics Package for Spark

I introduced the Optimized Analytics Package for Spark (OAP for Spark) which accelerates Online Analytics Processing (OLAP). OAP for Spark enables customers to use Spark for their ad-hoc query workloads, making full use of their memory and CPU power. This is a new open source project that is now available to the community at OAP Code. Lin Xiaodong, Director of Baidu* Infrastructure Department commented on the OAP use: “OAP for Spark is quite fit for Baidu’s data analytics requirements, and brings 1.5X-5X performance gain for ad-hoc query. We’d like to dive into the OAP open source community with Intel for more significant acceleration in the future releases, to unleash the power of new hardware platforms.”

Spark Summit 2017 was an inspiring and memorable experience. For more information, check out my keynote slides, BigDL video and watch my interview on SiliconAngle Cube TV.

Artificial Intelligence Will Usher In a Better World

We also demonstrated resources and technologies for data scientists and framework developers, including: how the Intel® Nervana™ AI Academy sharpens data scientists’ and developers’ machine learning skills; a comprehensive scheduling solution for Apache* Spark* on Intel® Xeon® + FGPA which provides Spark an API for Intel FPGA resource discovery, configuration, management, and intelligent scheduling; An ad-hoc SQL query engine on top of Spark SQL gave attendees a close look at the Spinach project’s user scenarios, architecture, performance, and real world adoption; and Deep Learning to Big Data Analytics on Apache Spark Using BigDL demonstrated speech recognition and object detection applications we built on BigDL.

In addition, other Intel sessions are a good source of reference: BigDL: Bringing Ease of Use of Deep Learning for Apache Spark by Jason Dai & Radhika Rangarajan; Accelerating SparkML Workloads on the Intel® Xeon®+FPGA Platform by Zhankun Tang & Zhongyue Nah; Optimized Analytics Package for Spark by Daoyuan Wang & Yuanjian Li (Baidu); A Predictive Analytics Workflow on DICOM Images using Apache Spark by Anahita Bhiwandiwalla & Karthik Vadla; Deep Learning to Big Data Analytics on Apache Spark Using BigDL by Xianyan Jia & Yuhao Yang; and Distributed End-to-End Drug Similarity Analytics and Visualization Workflow by Anahita Bhiwandiwalla & Dina Suehiro.

Follow Michael Greene on Twitter @greene1of5 for the latest on Intel in BigDL and more.

Published on Categories Big Data and AnalyticsTags , , ,
Michael Greene

About Michael Greene

Intel Vice President and General Manager of the System Technologies & Optimization, Intel’s Software and Services Group. Michael Greene is Intel Vice President and General Manager of the System Technologies & Optimization of Intel’s Software and Services Group. Greene leads a worldwide organization responsible for a broad range of development, enabling, architecture analysis and optimization efforts including system firmware, virtual platforms, modeling and simulation solutions, power analysis, client/server and big data software stack optimizations for a “Best in Class” user experience. Greene joined Intel in 1990, after graduating from the Massachusetts Institute of Technology and has managed several new product developments, research efforts, and engineering groups. He has served as Intel’s initiative owner for power efficiency, pre-silicon software development, and has driven new technology benchmarking throughout his career. Michael is also the Marketing Vice President on the National GEM Consortium’s (GEM) Executive Committee. GEM is a national non-profit providing programming and full fellowships to support the number of under-represented individuals who pursue a master’s or doctorate degree in science or engineering.