The core idea of AI is to replicate human capabilities in machines. This is a constant endeavor by researchers and tech enthusiasts to achieve a state where we are capable of bridging this gap. 

Computer vision is one such domain where we are witnessing many amazing breakthroughs. It is said to be critical in making human and machine interaction smoother. We are familiar with facial recognition features on our mobiles, tablets, and laptops. This is backed up by computer vision. Self-driving cars, too, are working on the idea of computer vision. Computer vision also plays a critical role in augmented and mixed reality. Google photo sorting and classification work on a similar idea. Computer vision algorithms also automate tasks relating to x-ray and MRI scans. 

Computer vision is the field of computer science that focuses on replicating the human vision system and enabling computers to identify and process objects in images and videos the way we humans do. 

 A Convolutional Neural Network is a Deep Learning algorithm that takes an image as input, processes it based on various aspects/objects of the image, and differentiates one image from another. 

 All the advancements in Computer Vision with Deep Learning are based upon one particular algorithm — a Convolutional Neural Network.

Let's get to see the basics of it so as to understand how this algorithm works.

For a computer or any device to understand the hand-written text, the computer looks at it as a grid of numbers. These numbers are RGB numbers from 0-255. However, if there is a slight shift in the digit, the grid pattern for that digit changes, and the computer won't fail to recognize it. So for these slight changes in the 2D representation of numbers, which is pretty obvious happens with all sorts of writing styles of us humans, Artificial neural network (ANN) comes to the rescue.

However, there are disadvantages of using ANN for image classification purposes.  

  • Too much computation
  • Treats local pixels same as pixels far apart
  • It is sensitive to the location of the object in the image.

However, human brains do image recognition pretty easily. So, let's take lessons from neuroscience!

We, humans, look at the picture and notice the feature, details, colors, and patterns in that image and retain those aspects so as to memorize the same.

In technical terms, to replicate the same, we create a feature map using convolutional operation.

The convolution operation is simple mathematics is a function derived from two given functions by integration expressing how the shape of one is modified by the other.

In CNN, convolution operation has three elements:

  • Input image
  • Feature detector
  • Feature map

We take a 3*3 grid from the original image grid and multiply individual numbers by the feature detector. This gives us a result that is divide by 9, and the result is denoted on to the feature map grid. The filter, too, can be variable, not necessarily 3*3, and can also be 3-dimensional.

When we have all the places filled up in the feature map, we get a particular feature detected. So filters are nothing but feature detectors. This way, we can even combine several feature maps, perform convolution operations on them and can aggregate results using various filters, and create a detector for a particularly intended pic or a sub-part of it.

Then these 2D and 3d arrays can be flattened and joined together. After joining, we can get a fully connected dense neural network. These neural networks help us recognize various forms or location shifts in the same image. 

This is exactly how we see google classifying the same person's pics even in different forms, locations, or even aging factors. There are two aspects one is feature detection, and the next is classification.

So these were just the basics of convolutional neural networks. There is more to it that we shall be discussing in the next article on the same to get how the algorithm works on complex images.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in