UNIT 2: CONVOLUTIONAL NEURAL NETWORKS (CNNs)

 

UNIT 2: CONVOLUTIONAL NEURAL NETWORKS (CNNs)

 

CONVOLUTIONAL NEURAL NETWORK                                                                      12

Introduction to Image Processing- Convolution Operation and Feature Extraction- Pooling Layers and Fully Connected Layers- Transfer Learning and Pre-trained Models (VGG, ResNet, Inception)- Applications of CNNs: Object Detection, Image Classification, Face Recognition- CNN Optimization Techniques: Data Augmentation, Dropout, Batch Normalization.

1. Introduction to Image Processing

Definition:

Image Processing refers to the technique of performing operations on an image to extract useful information or enhance it. It involves manipulating and analyzing images to extract meaningful information.

Basic Steps involved:

·         Image Acquisition – Capturing image from a source

·         Preprocessing -  Resizing , Normalization, Noise removal

·         Feature Extraction  - Identifying Edges, Textures and Colors

·         Classification and Segmentation – Deciding object categories or regions

Key Operations:

  • Filtering: Removing noise (e.g., Gaussian, Median filters)
  • Edge Detection: Sobel, Prewitt, Canny
  • Resizing & Scaling: Preparing for CNN inputs (e.g., 224x224)
  • Grayscale Conversion: Reduces computational cost

Why CNN in Image Processing?

          CNN automatically learn hierarchical features from raw pixel values, eliminating the need for manual feature engineering.

2. Convolution Operation and Feature Extraction

Convolution:

  • Mathematical operation on two functions to produce a third.
  • In CNNs, convolution detects local patterns (edges, corners, textures).
  • A mathematical operation where a Filter (Kernel) slides over the input image and computes dot products.
  • Purpose – Extract spatial features such as edges, shapes and patterns

Formula:

 

 II: input image, KK: kernel/filter.

·         Kernel: A small matrix (e.g., 3×33×3 or 5×55×5) that slides over the image. Each position on the kernel has a weight K(m,n)K(m,n).

·         Sliding Window: The kernel is applied to each overlapping region of the image. For each position, we multiply the corresponding pixel values of the image and the kernel weights and then sum these products to get a single number. This process creates a new matrix, called the feature map, which highlights certain features of the image.

Example Calculation

Key Concepts:

  • Kernel/Filter: Small matrix (e.g., 3x3) sliding over input.
  • Stride: Step size of the filter.
  • Padding: Adding border (zero) to preserve spatial size.

Padding and Stride in Convolutional Neural Network

Padding in CNNs

To preserve the spatial dimensions of the input, padding is applied. Padding adds extra rows and columns around the border of the input image.

·       Zero Padding: The simplest form of padding, where additional pixels with a value of zero are added around the image.

If we apply zero padding of 1 pixel to our 5×5 image, it becomes 7×7:

·         Stride controls how much the kernel shifts as it slides over the input image. For a stride of 1, the kernel moves one pixel at a time. For a stride of 2, it moves two pixels at a time, effectively down sampling the image.

The output size after applying a convolution operation with a given stride ss and padding pp can be calculated as:

 

 

 

 

 

Feature Maps:

  • Result after convolution
  • Capture spatial hierarchies from low to high-level features

3. Pooling Layers

Pooling is a down-sampling operation used to reduce the dimensions of the feature maps while retaining the most important information. Max pooling is the most common pooling operation, where the maximum value within a window is selected.

Purpose:

  • Reduce spatial dimensions and computation
  • Control overfitting
  • Make features more invariant to scale and translation

Types:

  • Max Pooling: Takes the max value in a region
  • Average Pooling: Takes the average value

 

Max Pooling:

For a 2×2 max pooling operation, consider the following example:

 

 

4. Fully Connected Layers (FC Layers)

Description:

  • After feature extraction (via conv and pooling), FC layers act as a classifier.
  • Each neuron is connected to every neuron in the previous layer.

Output:

  • Typically ends in a Softmax layer for classification.

5. Transfer Learning and Pre-Trained Models

Transfer Learning:

  • Reusing a CNN model trained on large dataset (like ImageNet) for a different but related task.

Benefits:

  • Less data required
  • Faster training
  • Higher accuracy with limited resources

Popular Pre-Trained Models:

Model

Key Features

VGG16/VGG19

Deep but simple architecture using 3x3 convolutions

ResNet

Uses residual blocks to solve vanishing gradient problem

Inception

Multi-scale convolutions (1x1, 3x3, 5x5) in one layer

 

6. Applications of CNNs

Object Detection:

  • Locates multiple objects in an image.
  • Algorithms: YOLO, SSD, Faster R-CNN

Image Classification:

  • Predicts the class label of an image (e.g., cat vs dog)
  • CNN learns spatial hierarchies of patterns

Face Recognition:

  • CNNs extract facial features and match them
  • Popular models: FaceNet, DeepFace

 

7. CNN Optimization Techniques

Data Augmentation:

  • Increases training data by transforming existing data
  • Types: Rotation, flipping, scaling, brightness changes

Dropout:

  • Regularization technique
  • Randomly disables neurons during training to prevent overfitting

Batch Normalization:

  • Normalizes activations within a mini-batch
  • Helps speed up training and improves stability

CNN Architecture Summary:

Layer Type

Purpose

Input Layer

Accepts raw pixel data

Convolution Layer

Extracts features via kernels

ReLU Layer

Applies non-linearity

Pooling Layer

Downsamples feature maps

Fully Connected

Learns high-level representations

Softmax Layer

Outputs class probabilities

 

Key Formulas:

  1. Output Size of Convolution:

O=(I−K+2P)S+1O = \left\lfloor \frac{(I - K + 2P)}{S} \right\rfloor + 1

Where:

  • II: Input size
  • KK: Kernel size
  • PP: Padding
  • SS: Stride
  1. Softmax Function:

σ(z)i=ezi∑jezj\sigma(z)_i = \frac{e^{z_i}}{\sum_{j} e^{z_j}}

 

 Example: Simple CNN Flow for MNIST Digit Classification

  1. Input: 28x28 grayscale image
  2. Conv1: 32 filters, 3x3 → ReLU
  3. Pool1: MaxPooling 2x2
  4. Conv2: 64 filters, 3x3 → ReLU
  5. Pool2: MaxPooling 2x2
  6. Flatten
  7. FC1: 128 neurons → ReLU
  8. FC2: 10 outputs (digits 0–9) → Softmax

 

Comments

Popular posts from this blog

Computer Networks

Unit 2 Data Link Layer - Functions and its Prototocols

UNIT I INTRODUCTION TO DEEP LEARNING