The earliest predecessors of modern deep learning were simple linear models motivated from a neuroscientific perspective.
These models were designed to take a set of n input values x1, …, x1 and associate them with an output y.
These models would learn a set of weights w1, …, wn and compute their output f(x,w) = x 1w1 + · · · +xnwn.
Neuron
This first wave of neural networks research was known as cybernetics.
The McCulloch-Pitts Neuron (McCulloch and Pitts, 1943) was an early model of brain function. This linear model could recognize two different categories of inputs by testing whether f (x,w) is positive or negative.
Of course, for the model to correspond to the desired definition of the categories, the weights needed to be set correctly.
These weights could be set by the human operator.
Perceptron
In the 1950s, the perceptron (Rosenblatt, 1958, 1962) became the first model that could learn the weights defining the categories given examples of inputs from each category.
The adaptive linear element (ADALINE)
The adaptive linear element (ADALINE), which dates from about the same time, simply returned the value of f (x) itself to predict a real number (Widrow and Hoff, 1960), and could also learn to predict these numbers from data.
These simple learning algorithms greatly affected the modern landscape of machine learning.
The training algorithm used to adapt the weights of the ADALINE was a special case of an algorithm called stochastic gradient descent.
Slightly modified versions of the stochastic gradient descent algorithm remain the dominant training algorithms for deep learning models today.
Models based on the f(x,w) used by the perceptron and ADALINE are called linear models.
These models remain some of the most widely used machine learning models, though in many cases they are trained in different ways than the original models were trained.