In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery.
CNNs are regularized versions of multilayer perceptrons.
en.wikipedia.org/wiki/Convolutional_neural_network
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
The pre-processing required in a ConvNet is much lower as compared to other classification algorithms.
While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image through the application of relevant filters.
The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction
Convolutional neural networks are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes.
The efficacy of convolutional nets (ConvNets or CNNs) in image recognition is one of the main reasons why the world has woken up to the efficacy of deep learning.
Convolutional neural networks ingest and process images as tensors, and tensors are matrices of numbers with additional dimensions.
From the Latin convolvere , “to convolve” means to roll together.
For mathematical purposes, a convolution is the integral measuring how much two functions overlap as one passes over the other.
Think of a convolution as a way of mixing two functions by multiplying them.
The next thing to understand about convolutional nets is that they are passing many filters over a single image, each one picking up a different signal.
At a fairly early layer, you could imagine them as passing a horizontal line filter, a vertical line filter, and a diagonal line filter to create a map of the edges in the image.Convolutional networks take those filters, slices of the image’s feature space, and map them one by one; that is, they create a map of each place that feature occurs.
By learning different portions of a feature space, convolutional nets allow for easily scalable and robust feature engineering.
Note that convolutional nets analyze images differently than RBMs.
While RBMs learn to reconstruct and identify the features of each image as a whole, convolutional nets learn images in pieces that we call feature maps.So convolutional networks perform a sort of search.
Picture a small magnifying glass sliding left to right across a larger image, and recommencing at the left once it reaches the end of one pass (like typewriters do).
That moving window is capable recognizing only one thing, say, a short vertical line.
Three dark pixels stacked atop one another.
It moves that vertical-line-recognizing filter over the actual pixels of the image, looking for matches.Each time a match is found, it is mapped onto a feature space particular to that visual element.
In that space, the location of each vertical line match is recorded, a bit like birdwatchers leave pins in a map to mark where they last saw a great blue heron.
A convolutional net runs many, many searches over a single image – horizontal lines, diagonal ones, as many as there are visual elements to be sought.
The first thing to know about convolutional networks is that they don’t perceive images like humans do.
Therefore, you are going to have to think in a different way about what an image means as it is fed to and processed by a convolutional network.Convolutional networks perceive images as volumes; i.e. three-dimensional objects, rather than flat canvases to be measured only by width and height.
That’s because digital color images have a red-blue-green (RGB) encoding, mixing those three colors to produce the color spectrum humans perceive.
A convolutional network ingests such images as three separate strata of color stacked one on top of the other.So a convolutional network receives a normal color image as a rectangular box whose width and height are measured by the number of pixels along those dimensions, and whose depth is three layers deep, one for each letter in RGB.
Those depth layers are referred to as channels .As images move through a convolutional network, we will describe them in terms of input and output volumes, expressing them mathematically as matrices of multiple dimensions in this form: 30x30x3.
From layer to layer, their dimensions change for reasons that will be explained below.
Rather than focus on one pixel at a time, a convolutional net takes in square patches of pixels and passes them through a filter.
That filter is also a square matrix smaller than the image itself, and equal in size to the patch.
It is also called a kernel , which will ring a bell for those familiar with support-vector machines, and the job of the filter is to find patterns in the pixels.