The Oral History of AI

We can’t listen to Newton walk us through the early days of physics, or Darwin chatting about the origins of his “Origins of Species” work. However, we can hear about the early years of artificial intelligence (AI) from one of its creators—Yann LeCun. Yann is a distinguished professor at NYU, the Chief AI Scientist at Facebook, and a winner of the Association for Computing Machinery A. M. Turing Award, widely considered the "Nobel Prize of Computing,” which was presented to him alongside fellow winners Geoffrey Hinton, a pioneer in artificial neural networks, who currently divides his time between Google Brain and the University of Toronto, and Yoshua Bengio, one of the most respected researchers in deep learning and Scientific Director of the Montreal Institute for Learning Algorithms. The three have been dubbed by the medias as the “Godfathers of AI.” So when Intel’s AI Tech Evangelist and New York Times best-selling author Abigail Hing Wen interviewed Yann on a recent episode of Intel on AI, I was excited to dive into the history of the field as seen through the eyes of someone who built the very foundation I and others continue to work on today.

Yann is best known for his work in computer vision using convolutional neural networks, work he did during his time at the legendary Bell Labs, which has enabled banks to read cheques, facial recognition to unlock phones, emergency early braking in modern automobiles, detecting tumors (and “covid lung”) in medical imagery, using a smart phone camera to identify plant and animal species, the speech recognition that lets my kids say “Hey Google!” and much, much more.

"I don't believe in the concept of artificial general intelligence. I don't think there is such a thing as general intelligence. I think every intelligence is somewhat specialized, including human intelligence, even if we'd like to think that's not the case."

-Yann LeCun

1940s-1950s: The Birth of AI

In the podcast, Yann says that the idea of machine intelligences goes back to the work of, of course, Alan Turing in the 1940s and 1950s—something I’ve written about before when covering the issues of ethics and AI. Turing paved the way for what has become traditional computer science, laying the theoretical underpinnings for general purpose computers (we actually describe them as “Turning machines”). In 1943 the first artificial neuron was proposed by Warren McCulloch, a neuroscientist, and Walter Pitts, a logician. The pair proposed that the type of actions that takes place in neurons can be seen as a computation, and therefore we can imagine that circuits of neurons could do logical reasoning.
Simultaneously, researchers in the late 1940s working on what became to be known as cybernetics, defined by Norbert Wiener in 1948—essentially the science of how parts of a system communicate with each other, which brings forth the ideas of autopoiesis (a system capable of reproducing and maintaining itself), regulation, learning, and adaptation. Together this research created a wave of interest in the field.

In 1956, Marvin Minsky, who later co-founded the Massachusetts Institute of Technology's AI laboratory, and John McCarthy, who later helped establish the Stanford AI Laboratory, organized a conference at Dartmouth College with the help of two scientists at IBM. At the conference Allen Newell and Herbert A. Simon debuted their computer program Logic Theorist, which was deliberately engineered to perform automated reasoning, and thus the term "artificial intelligence" was born; this was also the birth of the “great schism,” with symbolic reasoning gaining increasing primacy.

1960s-1980s: Learning the Limitations of AI

As Yann tells the history, for roughly twenty years the academic community was split into two categories: one that took inspiration from biology and the human brain, and one that drew inspiration from mathematics, creating symbolic reasoning systems (cast your mind back to high school paths and theorem proving). When Yann started his career in the early 1980s, he says that essentially no one was working on what we would consider machine learning today (symbolic reasoning was in the ascendant).

In 1986, the field gained renewed interested due to a paper titled "Learning representations by back-propagating errors" in the Nature journal by David E. Rumelhart (UC San Diego), Geoffrey Hinton (then at Carnegie-Mellon), and Ronald J. Williams (UC San Diego) which showed the potential successful applications of neural networks and the backpropagation learning algorithm. However, this excitement was somewhat damped as research soon found that the kind of applications that could be solved were relatively small in number because such systems require a lot of data to be trained properly. At the time, data was expensive—it couldn’t be gathered quickly from vast internet archives or open-source data sets like today.

1990s-2010s: The Dark Years

Yann describes the next ten years as a “black period” in the field, saying neural nets were not only ignored but mocked. He even quit working on neural nets between 1996 and 2002, joking with his future fellow Turning Award winners that their “deep learning conspiracy” would someday be accepted by the wider research community, all while working on projects like the DjVu image compression format with contemporary researcher Léon Bottou during their time at AT&T Labs.

“People like me were sometimes seen as marginal crazy people, who still clung to neural nets.”

-Yann LeCun

Abigail asks Yann a question which has long puzzled me: why didn’t he do the sensible thing—give up and join the mainstream? Yann says that during this period he clung to what he describes as “a sort of heuristic belief” that neural networks would yet be vindicated. This sounds rather like faith, but even in his wilderness years, he could take comfort from empirical evidence. He notes that the best MNIST benchmarks (a data set of handwritten digits) were always achieved using convolutional neural nets, even though support vector machines were coming very close at the time. In his mind, the limitation of traditional computational methods is due to the reliance on hand engineering a “front-end” or a “feature extractor” designed to capture the salient elements of an image or speech signal. By contrast, deep learning, especially on convolutional neural nets, can train the system end-to-end, with the algorithm shaping the “feature extractors” to be optimal for a given task.

Think of it like this: hand-engineered features are brilliantly designed by very clever people, rather like an artist painting a very realistic portrait in oils, but LeCun’s backpropagation algorithm allows the data to shape the “feature extractors” to fit the task—sort of like how softness of a beanbag allows it to mold itself to one’s body more perfectly than the most brilliantly-conceived designer furniture.
“This realization” (that features learning through an optimization procedure shaped by data from a specific problem can beat expertly-designed features) says Yann, “seemed like an obvious idea, and it is now, but it took about 20 years to convince the community that that was a good idea.”

Why did it take so long? Designing feature extractors can work quite well when little data or compute power is available; there’s also the satisfaction of pitting your wits against the problem directly, as opposed to relegating one’s self to collecting and cleaning data, and allowing the machine to do the “clever bit.” I should know—I used to be one of those people carefully hand-crafting features for machine vision systems. Relegating myself to a sort of “machine coach” was humbling at the time, but with hindsight is a very solid decision.

Another way of thinking about the long-term trends in AI is that we began by focusing on tasks associated with very intelligent people (symbolical reasoning, theorem proving, and so on), and that these methods failed not in that they didn’t work, but that they proved successful only within the abstract world in which they were conceived. Intelligence that we can use in the world needs, to some degree, to be of the world, shaped by empirical evidence (data!), not only emerging fully formed from a cleverly chosen set of axioms. As I’ve written previously, the most interesting outstanding problems in AI have more to do with matching the “common sense” and learning ability of a toddler than with creating synthetic versions of chess grandmasters.

2012: AI’s Big Breakthrough

In late 2009, the use of deep feedforward, non-recurrent networks for speech recognition was introduced by Geoffrey Hinton (by then at University of Toronto) and Li Deng, former Chief Scientist of AI at Microsoft. By October 2012 neural networks were once again back in the academic spotlight thanks to impressive benchmarks coming from AlexNet and other submissions to the PASCAL Visual Object Classes Challenge and the ImageNet Large Scale Visual Recognition Challenge at the European Conference on Computer Vision. That same year, Google Fellow Jeff Dean and former Intel on AI guest Andrew Ng were programming a computer cluster to train itself to automatically recognize images. Byt the fall, the New York Times was citing Dr. Richard F. Rashid’s Mandarin translation presentation with Microsoft as proof of deep learning’s potential, quoting his statement that such work marked “the most dramatic change in accuracy since 1979.”

Yann notes this build up to neural networks being embraced dates back several years thanks to the work of several pioneers, including fellow Turing Award winner Yoshua Bengio's text prediction work in the early 2000s, such as “A Neural Probabilistic Language Model” and Pascal Vicent’s work with denoising autoencoders, along with Ronan Collobert and Jason Weston’s work at the NEC Research Institute in Princeton, such as their 2011 paper “Natural Language Processing (Almost) from Scratch.”

2013-2017: An Avalanche of Advancement

After the pivotal year of 2012, advancements in AI started to snowball. In 2013, Tomas Mikolov and his colleagues at Google created Word2vec, a clever technique for learning a feature representation of words which does not require labelled data and allows NLP systems to look past spelling and focus on semantics. This also made it comparatively easy train multilingual systems that don’t care whether users write “dog” as ”perro,” ”chien,” or ”hund,” particularly useful for giving speakers of “low-resource” languages (for practical purposes, anything that isn’t English) access to volumes of information that us anglophones take for granted.

In 2014, Ilya Sutskever published “Sequence to Sequence Learning with Neural Networks,” describing a method of using a multilayered Long Short-Term Memory (LSTM) system suited to tasks like automated translation and summarization, and in 2015, Dzmitry Bahdanau published “Neural Machine Translation by Jointly Learning to Align and Translate.” Within just a few months, Google, Facebook, Microsoft, et al. had translation systems based on recurrent neural networks. In 2017, researchers at Google proposed a new, simple network architecture based solely on attention mechanisms in their paper “Attention Is All You Need.” (Jokey titles are all the rage in AI research, with many recent papers making plays on the names of Sesame Street characters.)

After decades during which gifted researchers explored very disparate approaches, often focused on very narrow application areas or on completely abstract problems, we’ve reached a point where modern, data-driven approaches are showing their value in all sorts of practical domains. We still need the AI theorists, but using data from the real world—in all its messy complexity and for all its flaws—has been the key ingredient in getting results we can apply to real problems.

2020 and Beyond: Open Research

Today, neural networks continue to be refined and used in new and exciting ways. Yann notes the work of Guillaume Lample and François Charton, fellow Facebook colleagues, and their recent paper showing how the systems are surprisingly good at mathematics—exciting not so much because Facebook’s users are clamoring for new theorem provers, but because problems like these have traditionally been seen as a particularly area of weakness for neural networks.

As I wrote about in the blog covering how Facebook uses AI as described by Jerome Pesenti, one of the most important advancements in AI isn’t the research itself, but in how the research is conducted: instead of being kept internally as trade secrets, it is perfectly normal for industrial research labs to publish papers accompanies by the code (and often the data) necessary to reproduce the results. I love this aspect of the AI field; I’m certain that it creates enormous “public good,” but my machine learning friends and I have puzzled for years as to why companies give away so many valuable things. In this podcast, Yann gives us a really simple explanation, at least in the case of the Facebook AI Research (FAIR) department: to accelerate the progress of AI as a whole to the point where, “you’ll see Google publishes a technique, and then within three months Facebook has an improvement on it. And within three months of that, Google has yet another improvement.” This new cultural norm—that a paper should come with code and data in the name of “reproducible research,” and it’s spreading across the sciences.

Like others in the field of AI, such as roboticist Pieter Abbeel, Yann is excited about the future of self-supervised learning (FAIR released yet another paper just after the podcast, with results that were stronger, simpler and much less computationally demanding than related work only a few months old) and, like World Bank’s Ed Hsu, about the ability of AI to transform health care and improve the world. As usual, my blog is but a pale shadow of the original podcast: Abigail has produced a really fascinating interview, and you really should listen to the full episode.

Interested in learning from Yann himself? His NYU course with Alfredo Canziani about deep learning is available on GitHub (naturally) for free!

Intel, of course, does a good deal of open source AI research as well, which you can find example of at:

To listen to more Intel on AI podcast episodes, including an upcoming episode with guests from Intel Labs discussing their cutting-edge AI work, visit: