Have you ever wondered how computers are able to recognize images? For example, when you see an image like this, you immediately know that it is a cat. But how does the computer know?
From your life experience, you know that a cat typically has sharp ears, round eyes, a triangular nose, and facial hair. The machine wants to figure out important information like that too! At a very high level, the way that image recognition works is that the computer will analyze the image in multiple steps. First, it tries to identify very simple aspects of the images: lines, edges, corners, blobs, etc. Using that information, we build up into slightly, just slightly more complex shapes: squares, circles, triangles. After a few iterations, it starts to recognize high-level features such as eyes, nose, mouth, etc. Finally, by putting all the pieces together, it computes a probability score for this image for each class of objects it could belong to (e.g., cat, dog, bird, etc). As we’ll see later, a layer of connected neurons is responsible for each of those steps and all those layers combine to form a convolutional neural network. A visualization looks something like: