Simple machine learning algorithms

A simple machine learning algorithm called logistic regression can determine whether to recommend cesarean delivery (Mor-Yosef et al., 1990).

A simple machine learning algorithm called naive Bayes can separate legitimate e-mail from spam e-mail.

The performance of these simple machine learning algorithms depends heavily on the representation of the data they are given.

For example, when logistic regression is used to recommend cesarean delivery, the AI system does not examine the patient directly.
Instead, the doctor tells the system several pieces of relevant information, such as the presence or absence of a uterine scar.
Each piece of information included in the representation of the patient is known as a feature.
Logistic regression learns how each of these features of the patient correlates with various outcomes.

However, it cannot influence the way that the features are defined in any way.
If logistic regression was given an MRI scan of the patient, rather than the doctor’s formalized report, it would not be able to make useful predictions.
Individual pixels in an MRI scan have negligible correlation with any complications that might occur during delivery.

Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm.

For example, a useful feature for speaker identification from sound is an estimate of the size of speaker’s vocal tract.
It therefore gives a strong clue as to whether the speaker is a man, woman, or child.

However, for many tasks, it is difficult to know what features should be extracted.

For example, suppose that we would like to write a program to detect cars in photographs.
We know that cars have wheels, so we might like to use the presence of a wheel as a feature. Unfortunately, it is difficult to describe exactly what a wheel looks like in terms of pixel values.
A wheel has a simple geometric shape but its image may be complicated by shadows falling on the wheel, the sun glaring off the metal parts of the wheel, the fender of the car or an object in the foreground obscuring part of the wheel, and so on.

Goodfellow, Bengio, Courville - «Deep Learning» (2016)