How many training examples is needed for a supervised learning?

As of 2016, a rough rule of thumb is that a supervised deep learning algorithm:

  • will generally achieve acceptable performance with around 5,000 labeled examples per category,
  • will match or exceed human performance when trained with a dataset containing at least 10 million labeled examples.

Working successfully with datasets smaller than this is an important research area, focusing in particular on how we can take advantage of large quantities of unlabeled examples, with unsupervised or semi-supervised learning.

Goodfellow, Bengio, Courville - «Deep Learning» (2016)