Innovations in Deep Learning at International Conference on Learning Representations

Dinner time in a foreign country is always an adventure for me. "Je lutte contre produits laitiers." My waiter looked at me first with confusion, and, finally concern. It was my first evening in Sanary-sur-Mer, where I was staying for the International Conference on Learning Representations (ICLR), hosted in nearby Toulon, France. Successfully using Google Translate for my first few interactions had perhaps given me a sense of linguistic hubris, and my attempt to communicate a dairy allergy to my waiter had almost certainly failed. The sound of laughing from near the back of the café confirmed my suspicions. Looking up, I could see my waiter and his colleague laughing with each other, the latter miming boxing motions at a nearby brie.

Machine translation technology has provided something of a lifeline for my travels. Complex inquiries that, delivered incorrectly, would at best lead to embarrassment and, at worst, disaster, have generally gone without incident:

Me: Where is the nearest grocery store—Pouvez-vous me dire où se trouve l'épicerie la plus proche?

Me: Is it safe to go for a run in this neighborhood— Est-il sécuritaire de faire une course dans ce quartier?

Me: I need an airport taxi at 4AM—Pouvez-vous m'aider à commander un taxi de l'aéroport pour 4 heures du matin.

Hotel Receptionist: 4AM? C'est tôt!
Me: Yes! It’s quite tôt!

Sipping on my espresso in the restaurant, I wondered to myself how many more of these failed interactions I would have had without the innovations realized by the technological community gathering in Toulon. Indeed, many of the presentations at ICLR 2017 outlined advances that will revolutionize the next generation of intelligent machines. Machine translation, image recognition, text classification, machines that can navigate complex environments, systems that can generate caricatures of people—the deep learning work presented here was as varied as its potential applications.

How about some background: machine learning, sometimes called artificial intelligence, is the field in computer science and statistics concerned with developing systems that can learn from data—either with expert-labeled information (called supervised learning), with unlabeled data (called unsupervised learning), or by getting rewarding feedback from their environment (called reinforcement learning). Deep learning, a set of techniques spanning each of these areas, maps data through networks of transformations in order to approximate some desired output.

Although deep learning is a diverse field, most techniques in the field share a common trait: they all involve computationally-intensive procedures that have only recently been made possible by advances in processor technology. As such, hardware companies have a big presence at conferences like ICLR. Companies like Apple, Google, and Intel all have a vested interest in understanding how the workloads that make things like Siri, Google Translate, or self-driving cars possible, and how we can run them faster and more efficiently.

One of the major themes represented at ICLR this year was how to make training deep learning networks easier. Neural networks are iteratively trained by adjusting the model parameters after measuring their prediction error relative to a cost function. What makes doing this tricky with network models is that these functions tend to have multiple solutions that appear to fit well, but may not be the overall best fit (we call these non-convex functions, in contrast to convex functions, which have a single best solution). I'll level with you: practitioners of deep learning find this whole "local but not global" best solution situation deeply annoying. We frequently use a method called stochastic gradient descent (SGD) to solve these problems, but it's always difficult to be sure we've found a good enough solution for our models.

Because of this, major areas of research at Intel include not only the training of neural networks in general, but also the relationships between network architecture, the relative difficulty of using SGD, the methods for increasing the likelihood of finding the global best solution, and, of course, ways we can speed this process up. In our artificial intelligence and labs divisions at Intel, we spend a lot of time thinking about these problems and ways we can make training deep neural networks as efficient as possible on Intel® architecture. Two of Intel’s presentations at ICLR stood out to me as examples of this work:

  • Jongsoo Park and colleagues in Pradeep Dubey's group out of Intel Labs presented a method for minimizing the number of parameters—and therefore the memory consumption—of Convolutional Neural Networks (CNNs), an architecture frequently used for computer vision tasks like object recognition. Using their technique, they presented speedups of over 7x on Intel Atom®, Intel® Xeon®, and Intel® Xeon Phi™ processors.
  • Aojun Zhou, Anbang Yao, and Yiwen Guo, out of Intel Labs China, presented a method for speeding up the process of training CNNs. In contrast to Park and colleagues approach, their group is working on techniques for representing parameters in CNNs with low precision while retaining, or even improving upon, model accuracy. They observed that CNNs are incredibly accurate at computer vision and object recognition tasks, but some of their success can be attributed to the millions of parameters the models entail, which can, at times, present a prohibitive memory burden. Presenting an approach specifically designed to excel on Intel field-programmable gate array (FPGA) hardware, Zhou and colleagues described a technique for low-precision representation of parameters in a CNN that not only maintains deep learning performance, but improves upon it.

To the outside observer, much of the content at ICLR likely appears theoretical in nature. I think there's something to this view. But, in contrast with many other academic conferences I attend, the development cycle that moves theory to product is extremely compressed. At Intel, we're already working to incorporate new findings into the next generations of Intel hardware and into software optimizations to frequently-used frameworks like Caffe*, Tensorflow*, and our own neon™.

It's an exciting time to be a part of the deep learning community! From providing improved translational lifelines to travelers, to developing faster object recognition methods, I’m excited to see what applications our work at Intel will enable. For more information on Intel and artificial intelligence, check out intel.com/ai and if you have any follow-up questions about our work deep learning at Intel, laissez un commentaire ci-dessous!

 

Published on Categories Artificial IntelligenceTags , , ,
Kyle Ambert

About Kyle Ambert

Kyle is Senior Deep Learning Data scientist at Intel Nervana, where he uses machine learning and deep learning methods to solve real-world analytical problems. He has a B.A. in Biological Psychology from Wheaton College, and a Ph.D. in Bioinformatics from Oregon Health & Science University, where his research focused on scalable machine learning-based methods for unstructured data curation and the application of artificial intelligence in the Neurosciences. At Intel Nervana, his team creates deep learning solution prototypes and researches optimization strategies for deep learning networks for text analytics, natural language processing, and image recognition.