What is «feature extraction» («pre-processing») in machine learning?

Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing.
A characteristic of these large data sets is a large number of variables that require a lot of computing resources to process.
Feature extraction is the name for methods that select and /or combine variables into features, effectively reducing the amount of data that must be processed, while still accurately and completely describing the original data set.

For most practical applications, the original input variables are typically preprocessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve.

For instance, in the digit recognition problem, the images of the digits are typically translated and scaled so that each digit is contained within a box of a fixed size.
This greatly reduces the variability within each digit class, because the location and scale of all the digits are now the same, which makes it much easier for a subsequent pattern recognition algorithm to distinguish between the different classes.
This pre-processing stage is sometimes also called feature extraction.
Note that new test data must be pre-processed using the same steps as the training data.

Christopher M. Bishop - «Pattern Recognition and Machine Learning» (2006)