The deep learning's history

Deep learning only appears to be new, because:

  • it was relatively unpopular for several years preceding its current popularity
  • it has gone through many different names, and has only recently become called “deep learning.” The field has been rebranded many times, reflecting the influence of different researchers and different perspectives.

There have been 3 waves of development of deep learning:

  • deep learning known as cybernetics in the 1940s–1960s
  • deep learning known as connectionism in the 1980s–1990s
  • the current resurgence under the name deep learning beginning in 2006.

05

The figure shows two of the three historical waves of artificial neural nets research, as measured by the frequency of the phrases “cybernetics” and “connectionism” or “neural networks” according to Google Books (the third wave is too recent to appear).

  • The first wave started with cybernetics in the 1940s–1960s, with the development of theories of biological learning (McCulloch and Pitts, 1943; Hebb, 1949) and implementations of the first models such as the perceptron (Rosenblatt, 1958) allowing the training of a single neuron.
  • The second wave started with the connectionist approach of the 1980–1995 period, with back-propagation (Rumelhart et al., 1986a) to train a neural network with one or two hidden layers.
  • The current and third wave, deep learning, started around 2006 (Hinton et al., 2006; Bengio et al., 2007; Ranzato et al., 2007a), and is just now appearing in book form as of 2016.
    The other two waves similarly appeared in book form much later than the corresponding scientific activity occurred.

Some of the earliest learning algorithms we recognize today were intended to be computational models of biological learning, i.e. models of how learning happens or could happen in the brain.
As a result, one of the names that deep learning has gone by is artificial neural networks (ANN). The corresponding perspective on deep learning models is that they are engineered systems inspired by the biological brain (whether the human brain or the brain of another animal).

While the kinds of neural networks used for machine learning have sometimes been used to understand brain function (Hinton and Shallice, 1991), they are generally not designed to be realistic models of biological function.

The neural perspective on deep learning is motivated by two main ideas.

  • One idea is that the brain provides a proof by example that intelligent behavior is possible, and a conceptually straightforward path to building intelligence is to reverse engineer the computational principles behind the brain and duplicate its functionality.
  • Another perspective is that it would be deeply interesting to understand the brain and the principles that underlie human intelligence, so machine learning models that shed light on these basic scientific questions are useful apart from their ability to solve engineering applications.

The modern term “deep learning” goes beyond the neuroscientific perspective on the current breed of machine learning models.
It appeals to a more general principle of learning multiple levels of composition, which can be applied in machine learning frameworks that are not necessarily neurally inspired.

Simple linear models: predecessors of modern deep learning

Why has the neuroscience's role in deep learning been diminished?

What is the «computational neuroscience»?

What is «connectionism»?

Deep neural networks in the mid-1990s - mid-2000s

The third wave of neural networks research

Increasing dataset sizes

Goodfellow, Bengio, Courville - «Deep Learning» (2016)