At first, we used the traditional computer vision approaches that I’d used my whole career, writing a big ball of custom logic to laboriously recognize one object at a time.
For example, to spot sky I’d first run a color detection filter over the whole image looking for shades of blue, and then look at the upper third.
If it was mostly blue, and the lower portion of the image wasn’t, then I’d classify that as probably a photo of the outdoors.
I’d been an engineer working on vision problems since the late 90’s, and the sad truth was that unless you had a research team and plenty of time behind you, this sort of hand-tailored hack was the only way to get usable results.
As you can imagine, the results were far from perfect and each detector I wrote was a custom job, and didn’t help me with the next thing I needed to recognize.
This probably seems laughable to anybody who didn’t work in computer vision in the recent past! It’s such a primitive way of solving the problem, it sounds like it should have been superseded long ago.
That’s why I was so excited when I started to play around with deep learning.
It became clear as I tried them out that the latest approaches using convolutional neural networks were producing far better results than my hand-tuned code on similar problems.
Not only that, the process of training a detector for a new class of object was much easier.
I didn’t have to think about what features to detect, I’d just supply a network with new training examples and it would take it from there.