AI, Deep Learning with BigDL, Apache Spark, and BlueData

At Intel, we’re seeing Artificial Intelligence (AI) transform the way that businesses operate and how people engage with the world.

AI has increased significantly in the last five years with the availability of large data sources, growth in distributing computing systems, and modern algorithms development based on neural networks. Machine learning and deep learning are propelling AI into all parts of modern life as it is applied to varied usages from computer vision to identification and classification from natural language processing to forecasting. These base-level tasks help to optimize decision-making in many areas of life.

As noted by data scientist and deep learning pioneer Andrew Ng in AI is the next electricity: “Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.”

Deep Learning and BigDL Drive AI Capabilities

Here at Intel, we’re assembling a broad set of technology options to drive AI capabilities in everything from smart factories and drones to sports, fraud detection, and autonomous cars.

One of our AI technology initiatives is the open-source BigDL, a distributed deep learning library for the Apache Spark* open-source cluster-computing framework, that Intel introduced earlier this year. Deep learning represents a major driver and enabling technology for AI, and the BigDL deep learning library is part of Intel’s strategy for enabling state-of-the-art AI in the industry.

Deep learning is becoming an increasingly important component of many organizations’ big data and data science initiatives. In fact, Gartner recently predicted that 80 percent of data scientists will have deep learning in their toolkits by 2018. But as Gartner points out, most organizations lack the necessary data science skills for deep learning.

To address this challenge, BigDL features an efficient large-scale distributed deep learning library built on Spark architecture that makes deep learning more accessible to big data users and data scientists. BigDL enables the exporting of AI expertise to data scientists now working across thousands of applications in hundreds of fields.

BigDL lets developers write deep learning applications as standard Spark programs that run on top of existing Spark or Apache Hadoop* clusters to put deep learning workloads more directly in touch with the data they use. It provides rich deep learning support, efficient scale out, and extremely high performance using Intel® Math Kernel Library (Intel® MKL), and multithreaded programming in each Spark task.

We’re also bringing together an ecosystem of partners that support BigDL. The latest partner to join that ecosystem is BlueData, a leading provider of software for Big-Data-as-a-Service. Intel has a business collaboration agreement with BlueData and we’ve worked with them at many enterprise customers (like Nasdaq, as just one recent example). We’ve also run benchmark tests to validate the performance of big data workloads on their software platform, which uses Docker* containers.

The BlueData software platform can greatly simplify and accelerate the deployment of Hadoop, Spark, and other big data workloads running on Intel® Xeon® architecture. And now they’re bringing that simplicity, agility, flexibility, and cost-efficiency to deep learning environments using BigDL and Spark.

Deep Learning with BigDL and Spark on BlueData

BlueData’s Big-Data-as-a-Service platform delivers on-demand, elastic, and secure multi-tenant environments for a wide range of big data analytics and data science use cases – whether on-premises, in the public cloud, or in a hybrid architecture.

And now the BlueData software platform includes a pre-integrated Docker-based application image for Intel’s BigDL. This means that BlueData customers can easily spin up instant containerized Spark clusters with BigDL for distributed deep learning – either on-premises or in public cloud – just as they do today for other big data analytics, data science, and machine learning environments. We’re pleased to welcome BlueData to the growing BigDL ecosystem of partners.

To learn more about BlueData support for BigDL, refer to the following BlueData blog post: Deep Learning with BigDL and Apache Spark on Docker.

And to learn more about BigDL, you can refer to this Intel blog post: BigDL: Distributed Deep Learning on Apache Spark.

Published on Categories Artificial IntelligenceTags , ,
Michael Greene

About Michael Greene

Michael Greene is Intel Vice President and General Manager of the System Technologies & Optimization of Intel’s Software and Services Group. Greene leads a worldwide organization responsible for a broad range of development, enabling, architecture analysis and optimization efforts including system firmware, virtual platforms, modeling and simulation solutions, power analysis, client/server and big data software stack optimizations for a “Best in Class” user experience. Greene joined Intel in 1990, after graduating from the Massachusetts Institute of Technology and has managed several new product developments, research efforts, and engineering groups. He has served as Intel’s initiative owner for power efficiency, pre-silicon software development, and has driven new technology benchmarking throughout his career. Michael is also the Marketing Vice President on the National GEM Consortium’s (GEM) Executive Committee. GEM is a national non-profit providing programming and full fellowships to support the number of under-represented individuals who pursue a master’s or doctorate degree in science or engineering.