Increasing model sizes

dmitrii_fediuk · May 25, 2019, 7:00pm

Another key reason that neural networks are wildly successful today after enjoying comparatively little success since the 1980s is that we have the computational resources to run much larger models today.

One of the main insights of connectionism is that animals become intelligent when many of their neurons work together.
An individual neuron or small collection of neurons is not particularly useful.

Biological neurons are not especially densely connected.
As seen in the Connections per neuron topic, our machine learning models have had a number of connections per neuron that was within an order of magnitude of even mammalian brains for decades.

In terms of the total number of neurons, neural networks have been astonishingly small until quite recently, as shown in the article «The number of neurons in animals and artificial neural networks».

Since the introduction of hidden units, artificial neural networks have doubled in size roughly every 2.4 years.
This growth is driven by faster computers with larger memory and by the availability of larger datasets.

Larger networks are able to achieve higher accuracy on more complex tasks.
This trend looks set to continue for decades.

Unless new technologies allow faster scaling, artificial neural networks will not have the same number of neurons as the human brain until at least the 2050s.

Biological neurons may represent more complicated functions than current artificial neurons, so biological neural networks may be even larger than this plot portrays.

In retrospect, it is not particularly surprising that neural networks with fewer neurons than a leech were unable to solve sophisticated artificial intelligence problems.
Even today’s networks, which we consider quite large from a computational systems point of view, are smaller than the nervous system of even relatively primitive vertebrate animals like frogs.

The increase in model size over time, due to the availability of faster CPUs, the advent of general purpose GPUs, faster network connectivity and better software infrastructure for distributed computing, is one of the most important trends in the history of deep learning.

This trend is generally expected to continue well into the future.

Goodfellow, Bengio, Courville - «Deep Learning» (2016)