Start Your Data Analytics Journey Today With Open Source and Transfer Learning

To jumpstart a data analytics program or artificial intelligence solution that will help solve your biggest business challenges, you don’t necessarily need a big upfront investment in equipment, personnel training, or massive data sets. There are some shortcuts you can use to score a quick win and rally your organization around new analytics capabilities, even if they’re initially skeptical about this investment.

“Transfer learning” is an AI practice that uses data, deep learning recipes, and models developed for one task and re-applies them to a different, but similar, task. Intel’s neon™ framework makes it easy to modify an existing model with just a two-line code change. Transfer learning, open source deep learning models (often published in a “Model Zoo”), and standard servers based on Intel® Xeon® processors can fast-track your initial AI programs, and accelerate the process to realizing value from your investment.

  • In manufacturing, we see transfer learning used to quickly produce models that can monitor the assembly line to identify noncompliant parts and make necessary adjustments. ImageNet computer vision models are a great starting point for this problem.
  • To create a system that can classify the sentiment of customer data in your business, for example, customer feedback on call center interactions, start with an LSTM sentiment classifier model.  Updating the model with a modest set of labeled examples from your own business will result in a more accurate sentiment classifier customized for your specific problem.

A business like Google has millions of training samples. You probably don’t.

Vast quantities of readily-available data are great, but it isn’t prerequisite for success. With modern machine learning and deep learning techniques, knowledge acquired by a machine working on one task can be transferred to a new task if the two are somewhat related.

For example, it’s true that the best machine vision models have trained on millions of samples, but it is also true that those models can be adapted to new vision problems with very modest amounts of data - perhaps only hundreds of samples. You can do this with data and models that are available as open source (like the ImageNet dataset, and models trained on it), to build your data analytics program.

A small data set can do big things

Intel has worked with Thorn, an organization that leverages technology to fight child sex trafficking, to apply transfer learning to tackle their huge data challenge. With more than 465,676 missing children reported to the FBI in 2016 alone, more than 100,000 escort advertisements posted online every day, and one in six children1 reported missing possible victims of sex trafficking as reported by the National Center for Missing and Exploited Children (NCMEC), the challenge is to match the images of children in the online escort ads with the pictures of known missing children.

Intel helped Thorn take open source models trained on general images of adults, and reuse the system to recognize and match images of trafficking victims. To further improve the ability of Thorn to find trafficking victims, Intel will use transfer learning on  Intel® Xeon® processors to retrain the model. Using a small data set of a thousand victims, we will take what the algorithm could already do, match general images of adults, and repurpose it to apply it to the new problem.

The end result will be a powerful system that solves the problem with far more accuracy than if we had built it from the ground up. This will take us hours, not days or weeks, and give us a powerful example of what’s possible with data analytics. You can do the same with data and models that are available as open source (like the ImageNet dataset, and models trained on it), to build your data analytics program.

Another thing to keep in mind is that Thorn had a very specific problem to solve that they clearly defined. It wasn’t only that they need to catch the bad guys or to decipher if an image is a known missing child. They were very clear that they needed an AI capability that can tell if two images of a child are the same person, regardless of age differences or other variances. Make sure you define a very specific problem to get the most value from your efforts.

Use familiar Intel Xeon platforms for AI

Purpose-built AI processors, such as the recently-disclosed Intel® Nervana™ Neural Network Processor, will be fantastic accelerators to future AI deployments, but today, the vast majority of AI runs on familiar, general-purpose Intel Xeon processors. There’s no reason not to get started today on the modern infrastructure you already own or use new Intel Xeon Scalable platforms with optimized software such as TensorFlow*optimizations for Intel® architecture.

It’s all about the data, but not necessarily your  data

What we can learn from the Thorn example is it’s all about the data, but you don’t need to have huge amounts of your own training data. If you take an open source model and retrain it to fit your new challenge, you can quickly have a data analytics solution to solve your problem, without a big upfront capital expenditure, and all made possible through transfer learning.

The Intel® Nervana™ AI Academy is a great place to start, with tools for deep learning training on your existing Intel infrastructure, as well as Intel-optimized frameworks available as open source, including BigDL, Intel® Optimization of Caffe* and TensorFlow*.

AI and advanced analytics can sound overwhelming, but with transfer learning techniques, open source models and data, and off-the-shelf Intel Xeon Scalable platforms, you can jump-start your project, score a quick win, and build from there.