Akash Manna
5 min readMay 8, 2021

CONVOLUTION LAYER :CNN

INTRODUCTION: One of the prominent disadvantage of Traditional ML is that it drastically fails to work efficiently, when it comes to deal with unstructured data(a type of data structure where the features are not provided explicitly as in case of image or in case of tubular type data, the interrelationship among the features are not provided explicitly ).To overcome that ambiguity in ML,CNN comes into rescue.

The name CNN comes because this type of neural network incorporates a mathematical operation known as Convolution Operation .CNN is specialised to deal with unstructured data. Mainly, CNN works in two steps:

  1. Extract features from the unstructured data.(making it structured data)

2. Now the structured data with extracted features are being fed to the FC network.

Our main focus in this blog will be on the step (1:How to extract features.).Only to extract features , many layers are being used as given below:

  1. Convolution layer,
  2. Activation Layer,
  3. Pooling Layer,
  4. Flatten Layer,

I will discuss only the Convolution Layer in a mathematical exemplar way with a real life example in this blog. So, let’s dive into that.

Convolution Layer: Convolution is a mathematical operation in which it takes functions f(x) and makes an weighted average of f(x) with respect to w(a).

In mathematical literature, the weighted average is given as:

s(t) = integration of (x(a)w(x − a)da.)

This operation is called convolution. The convolution operation is typically denoted with an asterisk:

s(t) = (x ∗ w)(t)=sum ( x(a)w(x-a) ) ; such that for x>a, w(-x)=0

In our example, w needs to be a valid probability density function, or the output is not a weighted average. Also, w needs to be 0 for all negative arguments, or it will look into the future, which is presumably beyond our capabilities.

Example:1

For 1-D case :

Suppose, there is a flight that is moving from Kolkata To Delhi. And ,I am measuring the positions of the flight starting from t0 to t6 at an interval, let’s suppose at 1 sec as given below :

Now, I want to measure my flight’s position at t5 position(i.e. 1.8unit).But, unfortunately, suppose, the sensor that was measuring my flight’s position was taking too much noise from the atmosphere and was not giving me the correct position.

In that situation what will you do??

One of the reliable ways may be :By taking the average of the previous positions of the flight in such a way that most weightage will be given to the correct position and less and less priority will be given to the far past positions.

Okh!!

Let’s implement whatever I just said:

The weight functions is now:

Look at the weight function, observe, as we go further away from t0 position, the values become diminishing.

Once, we have that weight function values, We will implement that weighted sum,as,

So, the weighted sum position at t0is :

S0=[(1.90*0.5)+(1.80*0.4)+(1.7*0.04)+(1.4*0.02)+(1.2*0.02)+(1.10*0.01)+(1*0.01)]

=1.811 unit

So, instead of at 1.90 unit position at time t0,it is at 1.811 unit position. That is more reliable in this scenario.

So, the same thing will repeat when to want to calculate the position at t1..

The weight function will move towards right by one box and the whole steps will be repeated.

So, the weighted sum of the position at t1:

S1=[(2.10*0.5)+(1.9*0.4)+(1.8*0.04)+(1.7*0.02)+(1.4*0.02)+(1.20*0.01)+(1.10*0.01)]

=1.967

So, instead of at 2.10 position, the flight is actually at 1.967 position.

In the same way, I can calculated the updated position of the flight upto t6position.

(Do it, as a fun exercise.)

Example:2

For 2D case:

The best 2D Scenario for this would be an image .Although, an image has three channels, let’s consider for only one channel, say red channel, and try to understand what would happen to that channel and the same thing will be repeated for other two remaining channels.

Any image is nothing but (n*m*3) tensor. Where, n=# of rows and m=# columns in one channel.

Now, suppose, convolution operation will be used upon the image, then the output will be like following:

Here size of the weight matrix(in CNN ,it is known as kernel or window) is (3*3),which is moving on the whole image taking a piece of (3*3) portion of the input image(having size (5*5)) and featuring into a lower size matrix of (4*4),output matrix.

Mathematically,

As, an example:

First element of the output matrix is created as:

S_11=[(21*-1)+(19–1)+(17–1)+(71–1)+(768)+(73–1)+(153–1)+(164–1)+(164–1)]=-74.

Similarly, the other elements of the output matrix are being created.

Now, what about the other three channels?

Now, to produce one element of the output feature, The filter will have the dimension of (33)3.The Filter will produce one number, say s1,the green filter will produce a number ,say s2,the blue filter will produce a number, say 3.So,the resultant value of the pixel at the output feature is s=s1+s2+s3.And the same thing will be for all the pixels at output feature .

Conclusion: Convolution layer is the affine layer in CNN. It is usually used for extracting pattern present in the image and making at an image. That’s why, kernel is called as Filter.

Akash Manna
Akash Manna

Written by Akash Manna

An aspirant Data Scientist. Completed Post Grad in Physics. Believe in 'Universe is made of Data.'

No responses yet