UNIT 2: CONVOLUTIONAL NEURAL NETWORKS (CNNs)

UNIT 2: CONVOLUTIONAL NEURAL NETWORKS (CNNs)

CONVOLUTIONAL NEURAL NETWORK 12

Introduction to Image Processing- Convolution Operation and Feature Extraction- Pooling Layers and Fully Connected Layers- Transfer Learning and Pre-trained Models (VGG, ResNet, Inception)- Applications of CNNs: Object Detection, Image Classification, Face Recognition- CNN Optimization Techniques: Data Augmentation, Dropout, Batch Normalization.

1. Introduction to Image Processing

Definition:

Image Processing refers to the technique of performing operations on an image to extract useful information or enhance it. It involves manipulating and analyzing images to extract meaningful information.

Basic Steps involved:

· Image Acquisition – Capturing image from a source

· Preprocessing - Resizing , Normalization, Noise removal

· Feature Extraction - Identifying Edges, Textures and Colors

· Classification and Segmentation – Deciding object categories or regions

Key Operations:

Filtering: Removing noise (e.g., Gaussian, Median filters)
Edge Detection: Sobel, Prewitt, Canny
Resizing & Scaling: Preparing for CNN inputs (e.g., 224x224)
Grayscale Conversion: Reduces computational cost

Why CNN in Image Processing?

CNN automatically learn hierarchical features from raw pixel values, eliminating the need for manual feature engineering.

2. Convolution Operation and Feature Extraction

Convolution:

Mathematical operation on two functions to produce a third.
In CNNs, convolution detects local patterns (edges, corners, textures).
A mathematical operation where a Filter (Kernel) slides over the input image and computes dot products.
Purpose – Extract spatial features such as edges, shapes and patterns

Formula:

II: input image, KK: kernel/filter.

· Kernel: A small matrix (e.g., 3×33×3 or 5×55×5) that slides over the image. Each position on the kernel has a weight K(m,n)K(m,n).

· Sliding Window: The kernel is applied to each overlapping region of the image. For each position, we multiply the corresponding pixel values of the image and the kernel weights and then sum these products to get a single number. This process creates a new matrix, called the feature map, which highlights certain features of the image.

Example Calculation

Key Concepts:

Kernel/Filter: Small matrix (e.g., 3x3) sliding over input.
Stride: Step size of the filter.
Padding: Adding border (zero) to preserve spatial size.

Padding and Stride in Convolutional Neural Network

Padding in CNNs

To preserve the spatial dimensions of the input, padding is applied. Padding adds extra rows and columns around the border of the input image.

· Zero Padding: The simplest form of padding, where additional pixels with a value of zero are added around the image.

If we apply zero padding of 1 pixel to our 5×5 image, it becomes 7×7:

· Stride controls how much the kernel shifts as it slides over the input image. For a stride of 1, the kernel moves one pixel at a time. For a stride of 2, it moves two pixels at a time, effectively down sampling the image.

The output size after applying a convolution operation with a given stride s and padding p can be calculated as:

Feature Maps:

Result after convolution
Capture spatial hierarchies from low to high-level features

3. Pooling Layers

Pooling is a down-sampling operation used to reduce the dimensions of the feature maps while retaining the most important information. Max pooling is the most common pooling operation, where the maximum value within a window is selected.

Purpose:

Reduce spatial dimensions and computation
Control overfitting
Make features more invariant to scale and translation

Types:

Max Pooling: Takes the max value in a region
Average Pooling: Takes the average value

Max Pooling:

For a max pooling operation, consider the following example:

4. Fully Connected Layers (FC Layers)

Description:

After feature extraction (via conv and pooling), FC layers act as a classifier.
Each neuron is connected to every neuron in the previous layer.

Output:

Typically ends in a Softmax layer for classification.

5. Transfer Learning and Pre-Trained Models

Transfer Learning:

Reusing a CNN model trained on large dataset (like ImageNet) for a different but related task.

Benefits:

Less data required
Faster training
Higher accuracy with limited resources

Popular Pre-Trained Models:

Model	Key Features
VGG16/VGG19	Deep but simple architecture using 3x3 convolutions
ResNet	Uses residual blocks to solve vanishing gradient problem
Inception	Multi-scale convolutions (1x1, 3x3, 5x5) in one layer

6. Applications of CNNs

Object Detection:

Locates multiple objects in an image.
Algorithms: YOLO, SSD, Faster R-CNN

Image Classification:

Predicts the class label of an image (e.g., cat vs dog)
CNN learns spatial hierarchies of patterns

Face Recognition:

CNNs extract facial features and match them
Popular models: FaceNet, DeepFace

7. CNN Optimization Techniques

Data Augmentation:

Increases training data by transforming existing data
Types: Rotation, flipping, scaling, brightness changes

Dropout:

Regularization technique
Randomly disables neurons during training to prevent overfitting

Batch Normalization:

Normalizes activations within a mini-batch
Helps speed up training and improves stability

CNN Architecture Summary:

Layer Type	Purpose
Input Layer	Accepts raw pixel data
Convolution Layer	Extracts features via kernels
ReLU Layer	Applies non-linearity
Pooling Layer	Downsamples feature maps
Fully Connected	Learns high-level representations
Softmax Layer	Outputs class probabilities

Key Formulas:

Output Size of Convolution:

O=⌊(I−K+2P)S⌋+1O = \left\lfloor \frac{(I - K + 2P)}{S} \right\rfloor + 1

Where:

II: Input size
KK: Kernel size
PP: Padding
SS: Stride

Softmax Function:

σ(z)i=ezi∑jezj\sigma(z)_i = \frac{e^{z_i}}{\sum_{j} e^{z_j}}

Example: Simple CNN Flow for MNIST Digit Classification

Input: 28x28 grayscale image
Conv1: 32 filters, 3x3 → ReLU
Pool1: MaxPooling 2x2
Conv2: 64 filters, 3x3 → ReLU
Pool2: MaxPooling 2x2
Flatten
FC1: 128 neurons → ReLU
FC2: 10 outputs (digits 0–9) → Softmax

Search This Blog

MyTutorials4U

UNIT 2: CONVOLUTIONAL NEURAL NETWORKS (CNNs)

Comments

Post a Comment

Popular posts from this blog

Computer Networks

Unit 2 Data Link Layer - Functions and its Prototocols

UNIT I INTRODUCTION TO DEEP LEARNING