We motivate why recurrent neural networks are important for dealing with sequence data and review LSTMs and GRU (gated recurrent unit) architectures. GRU is simplified LSTM. Notes: BPTT( back propagation through time)
Decades-old discoveries are now electrifying the computing industry and will soon transform corporate America.
Over the past four years, readers have doubtlessly noticed quantum leaps in the quality of a wide range of everyday technologies.
Most obviously, the speech-recognition functions on our smartphones work much better than they used to. When we use a voice command to call our spouses, we reach them now. We aren’t connected to Amtrak or an angry ex.
In fact, we are increasingly interacting with our computers by just talking to them, whether it’s Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, or the many voice-responsive features of Google. Chinese search giant Baidu says customers have tripled their use of its speech interfaces in the past 18 months.
Machine translation and other forms of language processing have also become far more convincing, with GoogleGOOGL0.46%, MicrosoftMSFT1.04%, Facebook FB-0.40%, and Baidu BIDU-1.50% unveiling new tricks every month. Google Translate now renders spoken sentences in one language into spoken sentences in another for 32 pairs of languages, while offering text translations for 103 tongues, including Cebuano, Igbo, and Zulu. Google’s Inbox app offers three ready-made replies for many incoming emails.
Then there are the advances in image recognition. The same four companies all have features that let you search or automatically organize collections of photos with no identifying tags. You can ask to be shown, say, all the ones that have dogs in them, or snow, or even something fairly abstract like hugs. The companies all have prototypes in the works that generate sentence-long descriptions for the photos in seconds.
Think about that. To gather up dog pictures, the app must identify anything from a Chihuahua to a German shepherd and not be tripped up if the pup is upside down or partially obscured, at the right of the frame or the left, in fog or snow, sun or shade. At the same time it needs to exclude wolves and cats. Using pixels alone. How is that possible?
The advances in image recognition extend far beyond cool social apps. Medical startups claim they’ll soon be able to use computers to read X-rays, MRIs, and CT scans more rapidly and accurately than radiologists, to diagnose cancer earlier and less invasively, and to accelerate the search for life-saving pharmaceuticals. Better image recognition is crucial to unleashing improvements in robotics, autonomous drones, and, of course, self-driving cars—a development so momentous that we made it a cover story in June. Ford F-0.40%, Tesla TSLA0.40%, Uber, Baidu, and Google parent Alphabet are all testing prototypes of self-piloting vehicles on public roads today.
But what most people don’t realize is that all these breakthroughs are, in essence, the same breakthrough. They’ve all been made possible by a family of artificial intelligence (AI) techniques popularly known as deep learning, though most scientists still prefer to call them by their original academic designation: deep neural networks.
The most remarkable thing about neural nets is that no human being has programmed a computer to perform any of the stunts described above. In fact, no human could. Programmers have, rather, fed the computer a learning algorithm, exposed it to terabytes of data—hundreds of thousands of images or years’ worth of speech samples—to train it, and have then allowed the computer to figure out for itself how to recognize the desired objects, words, or sentences.
In short, such computers can now teach themselves. “You essentially have software writing software,” says Jen-Hsun Huang, CEO of graphics processing leader Nvidia NVDA-1.13%, which began placing a massive bet on deep learning about five years ago. (For more, read Fortune’s interview with Nvidia CEO Jen-Hsun Huang.)
Neural nets aren’t new. The concept dates back to the 1950s, and many of the key algorithmic breakthroughs occurred in the 1980s and 1990s. What’s changed is that today computer scientists have finally harnessed both the vast computational power and the enormous storehouses of data—images, video, audio, and text files strewn across the Internet—that, it turns out, are essential to making neural nets work well. “This is deep learning’s Cambrian explosion,” says Frank Chen, a partner at the Andreessen Horowitz venture capital firm, alluding to the geological era when most higher animal species suddenly burst onto the scene.
That dramatic progress has sparked a burst of activity. Equity funding of AI-focused startups reached an all-time high last quarter of more than $1 billion, according to the CB Insights research firm. There were 121 funding rounds for such startups in the second quarter of 2016, compared with 21 in the equivalent quarter of 2011, that group says. More than $7.5 billion in total investments have been made during that stretch—with more than $6 billion of that coming since 2014. (In late September, five corporate AI leaders—Amazon, Facebook, Google, IBM, and Microsoft—formed the nonprofit Partnership on AI to advance public understanding of the subject and conduct research on ethics and best practices.)
Google had two deep-learning projects underway in 2012. Today it is pursuing more than 1,000, according to a spokesperson, in all its major product sectors, including search, Android, Gmail, translation, maps, YouTube, and self-driving cars. IBM’s IBM2.24% Watson system used AI, but not deep learning, when it beat two Jeopardy champions in 2011. Now, though, almost all of Watson’s 30 component services have been augmented by deep learning, according to Watson CTO Rob High.
Venture capitalists, who didn’t even know what deep learning was five years ago, today are wary of startups that don’t have it. “We’re now living in an age,” Chen observes, “where it’s going to be mandatory for people building sophisticated software applications.” People will soon demand, he says, “ ‘Where’s your natural-language processing version?’ ‘How do I talk to your app? Because I don’t want to have to click through menus.’ ”
For more on AI, watch this Fortune video:
Some companies are already integrating deep learning into their own day-to-day processes. Says Peter Lee, cohead of Microsoft Research: “Our sales teams are using neural nets to recommend which prospects to contact next or what kinds of product offerings to recommend.”
The hardware world is feeling the tremors. The increased computational power that is making all this possible derives not only from Moore’s law but also from the realization in the late 2000s that graphics processing units (GPUs) made by Nvidia—the powerful chips that were first designed to give gamers rich, 3D visual experiences—were 20 to 50 times more efficient than traditional central processing units (CPUs) for deep-learning computations. This past August, Nvidia announced that quarterly revenue for its data center segment had more than doubled year over year, to $151 million. Its chief financial officer told investors that “the vast majority of the growth comes from deep learning by far.” The term “deep learning” came up 81 times during the 83-minute earnings call.
For its part, Google revealed in May that for over a year it had been secretly using its own tailor-made chips, called tensor processing units, or TPUs, to implement applications trained by deep learning. (Tensors are arrays of numbers, like matrices, which are often multiplied against one another in deep-learning computations.)
Indeed, corporations just may have reached another inflection point. “In the past,” says Andrew Ng, chief scientist at Baidu Research, “a lot of S&P 500 CEOs wished they had started thinking sooner than they did about their Internet strategy. I think five years from now there will be a number of S&P 500 CEOs that will wish they’d started thinking earlier about their AI strategy.”
Even the Internet metaphor doesn’t do justice to what AI with deep learning will mean, in Ng’s view. “AI is the new electricity,” he says. “Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
Think of deep learning as a subset of a subset. “Artificial intelligence” encompasses a vast range of technologies—like traditional logic and rules-based systems—that enable computers and robots to solve problems in ways that at least superficially resemble thinking. Within that realm is a smaller category called machine learning, which is the name for a whole toolbox of arcane but important mathematical techniques that enable computers to improve at performing tasks with experience. Finally, within machine learning is the smaller subcategory called deep learning.
One way to think of what deep learning does is as “A to B mappings,” says Baidu’s Ng. “You can input an audio clip and output the transcript. That’s speech recognition.” As long as you have data to train the software, the possibilities are endless, he maintains. “You can input email, and the output could be: Is this spam or not?” Input loan applications, he says, and the output might be the likelihood a customer will repay it. Input usage patterns on a fleet of cars, and the output could advise where to send a car next.
Deep learning, in that vision, could transform almost any industry. “There are fundamental changes that will happen now that computer vision really works,” says Jeff Dean, who leads the Google Brain project. Or, as he unsettlingly rephrases his own sentence, “now that computers have opened their eyes.”
Does that mean it’s time to brace for “the singularity”—the hypothesized moment when superintelligent machines start improving themselves without human involvement, triggering a runaway cycle that leaves lowly humans ever further in the dust, with terrifying consequences?
Not just yet. Neural nets are good at recognizing patterns—sometimes as good as or better than we are at it. But they can’t reason.
The first sparks of the impending revolution began flickering in 2009. That summer Microsoft’s principal researcher Li Deng invited neural nets pioneer Geoffrey Hinton, of the University of Toronto, to visit. Impressed with his research, Deng’s group experimented with neural nets for speech recognition. “We were shocked by the results,” Lee says. “We were achieving more than 30% improvements in accuracy with the very first prototypes.
In 2011, Microsoft introduced deep-learning technology into its commercial speech-recognition products, according to Lee. Google followed suit in August 2012.
But the real turning point came in October 2012. At a workshop in Florence, Italy, Fei-Fei Li, the head of the Stanford AI Lab and the founder of the prominent annual ImageNet computer-vision contest, announced that two of Hinton’s students had invented software that identified objects with almost twice the accuracy of the nearest competitor. “It was a spectacular result,” recounts Hinton, “and convinced lots and lots of people who had been very skeptical before.” (In last year’s contest a deep-learning entrant surpassed human performance.)
Cracking image recognition was the starting gun, and it kicked off a hiring race. Google landed Hinton and the two students who had won that contest. Facebook signed up French deep learning innovator Yann LeCun, who, in the 1980s and 1990s, had pioneered the type of algorithm that won the ImageNet contest. And Baidu snatched up Ng, a former head of the Stanford AI Lab, who had helped launch and lead the deep-learning-focused Google Brain project in 2010.
The hiring binge has only intensified since then. Today, says Microsoft’s Lee, there’s a “bloody war for talent in this space.” He says top-flight minds command offers “along the lines of NFL football players.”
Geoffrey Hinton, 68, first heard of neural networks in 1972 when he started his graduate work in artificial intelligence at the University of Edinburgh. Having studied experimental psychology as an undergraduate at Cambridge, Hinton was enthusiastic about neural nets, which were software constructs that took their inspiration from the way networks of neurons in the brain were thought to work. At the time, neural nets were out of favor. “Everybody thought they were crazy,” he recounts. But Hinton soldiered on.
Neural nets offered the prospect of computers’ learning the way children do—from experience—rather than through laborious instruction by programs tailor-made by humans. “Most of AI was inspired by logic back then,” he recalls. “But logic is something people do very late in life. Kids of 2 and 3 aren’t doing logic. So it seemed to me that neural nets were a much better paradigm for how intelligence would work than logic was.” (Logic, as it happens, is one of the Hinton family trades. He comes from a long line of eminent scientists and is the great-great-grandson of 19th-century mathematician George Boole, after whom Boolean searches, logic, and algebra are named.)
During the 1950s and ’60s, neural networks were in vogue among computer scientists. In 1958, Cornell research psychologist Frank Rosenblatt, in a Navy-backed project, built a prototype neural net, which he called the Perceptron, at a lab in Buffalo. It used a punch-card computer that filled an entire room. After 50 trials it learned to distinguish between cards marked on the left and cards marked on the right. Reporting on the event, the New York Times wrote, “The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”
The Perceptron, whose software had only one layer of neuron-like nodes, proved limited. But researchers believed that more could be accomplished with multilayer—or deep—neural networks.
Hinton explains the basic idea this way. Suppose a neural net is interpreting photographic images, some of which show birds. “So the input would come in, say, pixels, and then the first layer of units would detect little edges. Dark one side, bright the other side.” The next level of neurons, analyzing data sent from the first layer, would learn to detect “things like corners, where two edges join at an angle,” he says. One of these neurons might respond strongly to the angle of a bird’s beak, for instance.
The next level “might find more complicated configurations, like a bunch of edges arranged in a circle.” A neuron at this level might respond to the head of the bird. At a still higher level a neuron might detect the recurring juxtaposition of beaklike angles near headlike circles. “And that’s a pretty good cue that it might be the head of a bird,” says Hinton. The neurons of each higher layer respond to concepts of greater complexity and abstraction, until one at the top level corresponds to our concept of “bird.”
To learn, however, a deep neural net needed to do more than just send messages up through the layers in this fashion. It also needed a way to see if it was getting the right results at the top layer and, if not, send messages back down so that all the lower neuron-like units could retune their activations to improve the results. That’s where the learning would occur.
In the early 1980s, Hinton was working on this problem. So was a French researcher named Yann LeCun, who was just starting his graduate work in Paris. LeCun stumbled on a 1983 paper by Hinton, which talked about multilayer neural nets. “It was not formulated in those terms,” LeCun recalls, “because it was very difficult at that time actually to publish a paper if you mentioned the word ‘neurons’ or ‘neural nets.’ So he wrote this paper in an obfuscated manner so it would pass the reviewers. But I thought the paper was super-interesting.” The two met two years later and hit it off.
In 1986, Hinton and two colleagues wrote a seminal paper offering an algorithmic solution to the error-correction problem. “His paper was basically the foundation of the second wave of neural nets,” says LeCun. It reignited interest in the field.
After a post-doc stint with Hinton, LeCun moved to AT&T’s Bell Labs in 1988, where during the next decade he did foundational work that is still being used today for most image-recognition tasks. In the 1990s, NCRNCR0.24%, which was then a Bell Labs subsidiary, commercialized a neural-nets-powered device, widely used by banks, which could read handwritten digits on checks, according to LeCun. At the same time, two German researchers—Sepp Hochreiter, now at the University of Linz, and Jürgen Schmidhuber, codirector of a Swiss AI lab in Lugano—were independently pioneering a different type of algorithm that today, 20 years later, has become crucial for natural-language processing applications.
Despite all the strides, in the mid-1990s neural nets fell into disfavor again, eclipsed by what were, given the computational power of the times, more effective machine-learning tools. That situation persisted for almost a decade, until computing power increased another three to four orders of magnitude and researchers discovered GPU acceleration.
But one piece was still missing: data. Although the Internet was awash in it, most data—especially when it came to images—wasn’t labeled, and that’s what you needed to train neural nets. That’s where Fei-Fei Li, a Stanford AI professor, stepped in. “Our vision was that big data would change the way machine learning works,” she explains in an interview. “Data drives learning.”
In 2007 she launched ImageNet, assembling a free database of more than 14 million labeled images. It went live in 2009, and the next year she set up an annual contest to incentivize and publish computer-vision breakthroughs.
In October 2012, when two of Hinton’s students won that competition, it became clear to all that deep learning had arrived.
By then the general public had also heard about deep learning, though due to a different event. In June 2012, Google Brain published the results of a quirky project now known colloquially as the “cat experiment.” It struck a comic chord and went viral on social networks.
The project actually explored an important unsolved problem in deep learning called “unsupervised learning.” Almost every deep-learning product in commercial use today uses “supervised learning,” meaning that the neural net is trained with labeled data (like the images assembled by ImageNet). With “unsupervised learning,” by contrast, a neural net is shown unlabeled data and asked simply to look for recurring patterns. Researchers would love to master unsupervised learning one day because then machines could teach themselves about the world from vast stores of data that are unusable today—making sense of the world almost totally on their own, like infants.
In the cat experiment, researchers exposed a vast neural net—spread across 1,000 computers—to 10 million unlabeled images randomly taken from YouTube videos, and then just let the software do its thing. When the dust cleared, they checked the neurons of the highest layer and found, sure enough, that one of them responded powerfully to images of cats. “We also found a neuron that responded very strongly to human faces,” says Ng, who led the project while at Google Brain.
Yet the results were puzzling too. “We did not find a neuron that responded strongly to cars,” for instance, and “there were a lot of other neurons we couldn’t assign an English word to. So it’s difficult.”
The experiment created a sensation. But unsupervised learning remains uncracked—a challenge for the future.
Not surprisingly, most of the deep-learning applications that have been commercially deployed so far involve companies like Google, Microsoft, Facebook, Baidu, and Amazon—the companies with the vast stores of data needed for deep-learning computations. Many companies are trying to develop more realistic and helpful “chatbots”—automated customer-service representatives.
FOUR TECH GIANTS GET SERIOUS ABOUT DEEP LEARNING
Google launched the deep-learning-focused Google Brain project in 2011, introduced neural nets into its speech-recognition products in mid-2012, and retained neural nets pioneer Geoffrey Hinton in March 2013. It now has more than 1,000 deep-learning projects underway, it says, extending across search, Android, Gmail, photo, maps, translate, YouTube, and self-driving cars. In 2014 it bought DeepMind, whose deep reinforcement learning project, AlphaGo, defeated the world’s go champion, Lee Sedol, in March, achieving an artificial intelligence landmark.
Microsoft introduced deep learning into its commercial speech-recognition products, including Bing voice search and X-Box voice commands, during the first half of 2011. The company now uses neural nets for its search rankings, photo search, translation systems, and more. “It’s hard to convey the pervasive impact this has had,” says Lee. Last year it won the key image-recognition contest, and in September it scored a record low error rate on a speech-recognition benchmark: 6.3%.
In December 2013, Facebook hired French neural nets innovator Yann LeCun to direct its new AI research lab. Facebook uses neural nets to translate about 2 billion user posts per day in more than 40 languages, and says its translations are seen by 800 million users a day. (About half its community does not speak English.) Facebook also uses neural nets for photo search and photo organization, and it’s working on a feature that would generate spoken captions for untagged photos that could be used by the visually impaired.
In May 2014, Baidu hired Andrew Ng, who had earlier helped launch and lead the Google Brain project, to lead its research lab. China’s leading search and web services site, Baidu uses neural nets for speech recognition, translation, photo search, and a self-driving car project, among others. Speech recognition is key in China, a mobile-first society whose main language, Mandarin, is difficult to type into a device. The number of customers interfacing by speech has tripled in the past 18 months, Baidu says.
Companies like IBM and Microsoft are also helping business customers adapt deep-learning-powered applications—like speech-recognition interfaces and translation services—for their own businesses, while cloud services like Amazon Web Services provide cheap, GPU-driven deep-learning computation services for those who want to develop their own software. Plentiful open-source software—like Caffe, Google’s TensorFlow, and Amazon’s DSSTNE—have greased the innovation process, as has an open-publication ethic, whereby many researchers publish their results immediately on one database without awaiting peer-review approval.
Many of the most exciting new attempts to apply deep learning are in the medical realm (see sidebar). We already know that neural nets work well for image recognition, observes Vijay Pande, a Stanford professor who heads Andreessen Horowitz’s biological investments unit, and “so much of what doctors do is image recognition, whether we’re talking about radiology, dermatology, ophthalmology, or so many other ‘-ologies.’ ”
DEEP LEARNING AND MEDICINE
Startup Enlitic uses deep learning to analyze radiographs and CT and MRI scans. CEO Igor Barani, formerly a professor of radiation oncology at the University of California in San Francisco, says Enlitic’s algorithms outperformed four radiologists in detecting and classifying lung nodules as benign or malignant. (The work has not been peer reviewed, and the technology has not yet obtained FDA approval.)
Merck is trying to use deep learning to accelerate drug discovery, as is a San Francisco startup called Atomwise. Neural networks examine 3D images—thousands of molecules that might serve as drug candidates—and predict their suitability for blocking the mechanism of a pathogen. Such companies are using neural nets to try to improve what humans already do; others are trying to do things humans can’t do at all. Gabriel Otte, 27, who has a Ph.D. in computational biology, started Freenome, which aims to diagnose cancer from blood samples. It examines DNA fragments in the bloodstream that are spewed out by cells as they die. Using deep learning, he asks computers to find correlations between cell-free DNA and some cancers. “We’re seeing novel signatures that haven’t even been characterized by cancer biologists yet,” says Otte.
When Andreessen Horowitz was mulling an investment in Freenome, AH’s Pande sent Otte five blind samples—two normal and three cancerous. Otte got all five right, says Pande, whose firm decided to invest.
While a radiologist might see thousands of images in his life, a computer can be shown millions. “It’s not crazy to imagine that this image problem could be solved better by computers,” Pande says, “just because they can plow through so much more data than a human could ever do.”
The potential advantages are not just greater accuracy and faster analysis, but democratization of services. As the technology becomes standard, eventually every patient will benefit.
The greatest impacts of deep learning may well be felt when it is integrated into the whole toolbox of other artificial intelligence techniques in ways that haven’t been thought of yet. Google’s DeepMind, for instance, has already been accomplishing startling things by combining deep learning with a related technique called reinforcement learning. Using the two, it created AlphaGo, the system that, this past March, defeated the champion player of the ancient Chinese game of go—widely considered a landmark AI achievement. Unlike IBM’s Deep Blue, which defeated chess champion Garry Kasparov in 1997, AlphaGo was not programmed with decision trees, or equations on how to evaluate board positions, or with if-then rules. “AlphaGo learned how to play go essentially from self-play and from observing big professional games,” says Demis Hassabis, DeepMind’s CEO. (During training, AlphaGo played a million go games against itself.)
A game might seem like an artificial setting. But Hassabis thinks the same techniques can be applied to real-world problems. In July, in fact, Google reported that, by using approaches similar to those used by AlphaGo, DeepMind was able to increase the energy efficiency of Google’s data centers by 15%. “In the data centers there are maybe 120 different variables,” says Hassabis. “You can change the fans, open the windows, alter the computer systems, where the power goes. You’ve got data from the sensors, the temperature gauges, and all that. It’s like the go board. Through trial and error, you learn what the right moves are.
“So it’s great,” he continues. “You could save, say, tens of millions of dollars a year, and it’s also great for the environment. Data centers use a lot of power around the world. We’d like to roll it out on a bigger scale now. Even the national grid level.”
Chatbots are all well and good. But that would be a cool app.
A version of this article appears in the October 1, 2016 issue of Fortune with the headline “The Deep-Learning Revolution.” This version contains updated figures from the CB Insights research firm.