The earliest predecessors of modern deep learning were

simple linear modelsmotivated from a neuroscientific perspective.

These models were designed to take a set of n input valuesxand associate them with an output_{1}, …, x_{1}y.

These models would learn a set of weightswand compute their output_{1}, …, w_{n}f(x,w) = x._{1}w_{1}+ · · · +x_{n}w_{n}

### Neuron

This first wave of neural networks research was known as

cybernetics.

The McCulloch-PittsNeuron(McCulloch and Pitts, 1943) was an early model of brain function. This linear model could recognize two different categories of inputs by testing whether f (x,w) is positive or negative.

Of course, for the model to correspond to the desired definition of the categories, the weights needed to be set correctly.

Theseweights could be set by the human operator.

### Perceptron

In the 1950s, the

perceptron(Rosenblatt, 1958, 1962) became the first model that couldlearn the weightsdefining the categories given examples of inputs from each category.

### The adaptive linear element (ADALINE)

The adaptive linear element (ADALINE), which dates from about the same time, simply returned the value of f (x) itself to predict a real number (Widrow and Hoff, 1960), and could also

learn to predict these numbersfrom data.

These simple learning algorithms greatly

affected the modern landscape of machine learning.

The training algorithm used to adapt the weights of the ADALINE was a special case of an algorithm called stochastic gradient descent.

Slightly modified versions of the stochastic gradient descent algorithm remain the dominant training algorithms for deep learning models today.

Models based on the f(x,w) used by the perceptron and ADALINE are called linear models.

These models remain some of the most widely used machine learning models, though in many cases they are trained in different ways than the original models were trained.