UNIT 2: CONVOLUTIONAL NEURAL NETWORKS (CNNs)
UNIT 2: CONVOLUTIONAL NEURAL
NETWORKS (CNNs)
CONVOLUTIONAL NEURAL NETWORK
12
Introduction to Image Processing- Convolution Operation and
Feature Extraction- Pooling Layers and Fully Connected Layers- Transfer
Learning and Pre-trained Models (VGG, ResNet, Inception)- Applications of CNNs: Object Detection, Image Classification, Face
Recognition- CNN Optimization Techniques: Data Augmentation, Dropout, Batch
Normalization.
1. Introduction to Image Processing
Definition:
Image Processing refers to the technique of performing
operations on an image to extract useful information or enhance it. It involves
manipulating and analyzing images to extract meaningful information.
Basic
Steps involved:
·
Image Acquisition – Capturing
image from a source
·
Preprocessing - Resizing , Normalization, Noise removal
·
Feature Extraction - Identifying Edges, Textures and Colors
·
Classification and Segmentation – Deciding object categories or regions
Key Operations:
- Filtering: Removing noise (e.g., Gaussian, Median filters)
- Edge
Detection: Sobel, Prewitt, Canny
- Resizing
& Scaling: Preparing for CNN inputs
(e.g., 224x224)
- Grayscale
Conversion: Reduces computational cost
Why CNN in Image Processing?
CNN automatically learn hierarchical
features from raw pixel values, eliminating the need for manual feature
engineering.
2. Convolution Operation and Feature
Extraction
Convolution:
- Mathematical
operation on two functions to produce a third.
- In
CNNs, convolution detects local patterns (edges, corners, textures).
- A
mathematical operation where a Filter (Kernel) slides over the input image
and computes dot products.
- Purpose
– Extract spatial features such as edges, shapes and patterns
Formula:
II: input image, KK: kernel/filter.
·
Kernel: A small matrix (e.g., 3×33×3 or 5×55×5) that slides over the image.
Each position on the kernel has a weight K(m,n)K(m,n).
·
Sliding Window: The kernel is applied to each
overlapping region of the image. For each position, we multiply the
corresponding pixel values of the image and the kernel weights and then sum
these products to get a single number. This process creates a new matrix,
called the feature map, which highlights certain features of the image.
Example
Calculation
Key Concepts:
- Kernel/Filter: Small matrix (e.g., 3x3) sliding over input.
- Stride: Step size of the filter.
- Padding: Adding border (zero) to preserve spatial size.
Padding
and Stride in Convolutional Neural Network
To preserve the spatial
dimensions of the input, padding is applied. Padding adds extra rows and
columns around the border of the input image.
·
Zero Padding: The simplest form of padding, where
additional pixels with a value of zero are added around the image.
If we apply zero padding of 1
pixel to our 5×5 image, it becomes 7×7:
·
Stride controls how much the kernel
shifts as it slides over the input image. For a stride of 1, the kernel moves
one pixel at a time. For a stride of 2, it moves two pixels at a time,
effectively down sampling the image.
The output size after applying
a convolution operation with a given stride ss and padding pp can be calculated as:
Feature Maps:
- Result
after convolution
- Capture
spatial hierarchies from low to high-level features
3. Pooling Layers
Pooling is a down-sampling
operation used to reduce the dimensions of the feature maps while retaining the
most important information. Max pooling is the most common pooling operation,
where the maximum value within a window is selected.
Purpose:
- Reduce
spatial dimensions and computation
- Control
overfitting
- Make
features more invariant to scale and translation
Types:
- Max
Pooling: Takes the max value in a
region
- Average
Pooling: Takes the average value
Max Pooling:
For a 2×2 max pooling operation, consider the
following example:
4. Fully Connected Layers (FC
Layers)
Description:
- After
feature extraction (via conv and pooling), FC layers act as a classifier.
- Each
neuron is connected to every neuron in the previous layer.
Output:
- Typically
ends in a Softmax layer for classification.
5. Transfer Learning and Pre-Trained
Models
Transfer Learning:
- Reusing
a CNN model trained on large dataset (like ImageNet) for a different but
related task.
Benefits:
- Less
data required
- Faster
training
- Higher
accuracy with limited resources
Popular Pre-Trained Models:
Model |
Key Features |
VGG16/VGG19 |
Deep but simple architecture using
3x3 convolutions |
ResNet |
Uses residual blocks to solve
vanishing gradient problem |
Inception |
Multi-scale convolutions (1x1,
3x3, 5x5) in one layer |
6. Applications of CNNs
Object Detection:
- Locates
multiple objects in an image.
- Algorithms:
YOLO, SSD, Faster R-CNN
Image Classification:
- Predicts
the class label of an image (e.g., cat vs dog)
- CNN
learns spatial hierarchies of patterns
Face Recognition:
- CNNs
extract facial features and match them
- Popular
models: FaceNet, DeepFace
7. CNN Optimization Techniques
Data Augmentation:
- Increases
training data by transforming existing data
- Types: Rotation, flipping, scaling, brightness changes
Dropout:
- Regularization
technique
- Randomly
disables neurons during training to prevent overfitting
Batch Normalization:
- Normalizes
activations within a mini-batch
- Helps
speed up training and improves stability
CNN Architecture Summary:
Layer Type |
Purpose |
Input Layer |
Accepts raw pixel data |
Convolution Layer |
Extracts features via kernels |
ReLU Layer |
Applies non-linearity |
Pooling Layer |
Downsamples feature maps |
Fully Connected |
Learns high-level representations |
Softmax Layer |
Outputs class probabilities |
Key Formulas:
- Output Size of Convolution:
O=⌊(I−K+2P)S⌋+1O = \left\lfloor \frac{(I - K + 2P)}{S} \right\rfloor + 1
Where:
- II:
Input size
- KK:
Kernel size
- PP:
Padding
- SS:
Stride
- Softmax Function:
σ(z)i=ezi∑jezj\sigma(z)_i =
\frac{e^{z_i}}{\sum_{j} e^{z_j}}
Example: Simple CNN Flow for MNIST Digit
Classification
- Input: 28x28 grayscale image
- Conv1: 32 filters, 3x3 → ReLU
- Pool1: MaxPooling 2x2
- Conv2: 64 filters, 3x3 → ReLU
- Pool2: MaxPooling 2x2
- Flatten
- FC1: 128 neurons → ReLU
- FC2: 10 outputs (digits 0–9) → Softmax
Comments
Post a Comment