XBC603A MACHINE LEARNING UNIT 3 UNSUPERVISED LEARNING
Unit 3 – Unsupervised
Learning
Unsupervised Learning is a type of machine learning
where the model works without labelled data. It learns patterns on its own by
grouping similar data points or finding hidden structures without any human
intervention.
·
It is
used for tasks like clustering, dimensionality reduction and Association Rule
·
Learning.
·
Helps
identify hidden patterns in data
·
Useful
for grouping, compression and anomaly detection
The image shows set of animals like elephants, camels and cows that
represents raw data that the unsupervised learning algorithm will process.
·
The "Interpretation" stage signifies that the algorithm
doesn't have predefined labels or categories for the data. It needs to figure
out how to group or organize the data based on inherent patterns.
·
An algorithm represents unsupervised learning process that helps to
identify patterns in the data.
·
The processing stage shows the algorithm working on the data.
The output shows the results of the unsupervised learning process. In this
case, the algorithm might have grouped the animals into clusters based on their
species (elephants, camels, cows).
Working of Unsupervised Learning
The working of unsupervised machine learning can be explained in these
steps:
1. Collect Unlabeled Data
·
Gather a dataset without predefined labels or categories.
·
Example: Images of various animals
without any tags.
2. Select an Algorithm
·
Choose a suitable unsupervised algorithm such as clustering like
K-Means, association rule learning like Apriori or dimensionality reduction
like PCA based on the goal.
3. Train the Model on Raw Data
·
Feed the entire unlabeled dataset to the algorithm.
·
The algorithm looks for similarities, relationships or hidden structures
within the data.
4. Group or Transform Data
·
The algorithm organizes data into groups (clusters), rules or
lower-dimensional forms without human input.
·
Example: It may group similar animals together or extract key patterns
from large datasets.
5. Interpret and Use Results
·
Analyze the discovered groups, rules or features to gain insights or use
them for further tasks like visualization, anomaly detection or as input for
other models.
Unsupervised Learning Algorithms
There are mainly 3 types of Unsupervised Algorithms that are used:
1. Clustering Algorithms
Clustering is an unsupervised
machine learning technique that groups unlabeled data into clusters based on
similarity. Its goal is to discover patterns or relationships within the data
without any prior knowledge of categories or labels.
·
Groups data points that share similar features or characteristics.
·
Helps find natural groupings in raw, unclassified data.
·
Commonly used for customer segmentation, anomaly detection and data
organization.
·
Works purely from the input data without any output labels.
·
Enables understanding of data structure for further analysis or
decision-making.
Some
common clustering algorithms:
·
K-means Clustering: Groups data into K clusters based on how
close the points are to each other.
·
Hierarchical Clustering: Creates clusters by building a tree
step-by-step, either merging or splitting groups.
·
Density-Based Clustering (DBSCAN):
Finds clusters in dense areas and treats scattered points as noise.
·
Mean-Shift Clustering: Discovers clusters by moving points toward
the most crowded areas.
·
Spectral Clustering: Groups data by analyzing connections between
points using graphs.
2. Association Rule Learning
Association rule learning is a
rule-based unsupervised learning technique used to discover interesting
relationships between variables in large datasets. It identifies patterns in
the form of “if-then” rules, showing how the presence of some items in the data
implies the presence of others.
·
Finds frequent item combinations and the rules connecting them.
·
Commonly used in market basket analysis to understand product purchase
relationships.
·
Helps retailers design promotions and cross-selling strategies.
Some
common Association Rule Learning algorithms:
·
Apriori Algorithm: Finds patterns by exploring frequent
item combinations step-by-step.
·
FP-Growth Algorithm: An Efficient Alternative to Apriori. It
quickly identifies frequent patterns without generating candidate sets.
·
Eclat Algorithm: Uses intersections of itemsets to
efficiently find frequent patterns.
·
Efficient Tree-based Algorithms: Scales
to handle large datasets by organizing data in tree structures.
3. Dimensionality Reduction
Dimensionality reduction is the
process of decreasing the number of features or variables in a dataset while
retaining as much of the original information as possible. This technique helps
simplify complex data making it easier to analyze and visualize. It also
improves the efficiency and performance of machine learning algorithms by
reducing noise and computational cost.
·
It reduces the dataset’s feature space from many dimensions to fewer,
more meaningful ones.
·
Helps focus on the most important traits or patterns in the data.
·
Commonly used to improve model speed and reduce overfitting.
Here
are some popular Dimensionality Reduction algorithms:
·
Principal Component Analysis (PCA): Reduces
dimensions by transforming data into uncorrelated principal components.
·
Linear Discriminant Analysis (LDA): Reduces
dimensions while maximizing class separability for classification tasks.
·
Non-negative Matrix Factorization (NMF): Breaks
data into non-negative parts to simplify representation.
·
Locally Linear Embedding (LLE): Reduces dimensions while preserving the relationships between
nearby points.
·
Isomap: Captures global data structure by preserving distances along a
manifold.
Applications of Unsupervised learning
Unsupervised learning has diverse applications across industries and
domains. Key applications include:
·
Customer Segmentation: Algorithms cluster customers
based on purchasing behavior or demographics, enabling targeted marketing
strategies.
·
Anomaly Detection: Identifies unusual patterns
in data, aiding fraud detection, cybersecurity and equipment failure
prevention.
·
Recommendation Systems: Suggests products, movies or
music by analyzing user behavior and preferences.
·
Image and Text Clustering: Groups
similar images or documents for tasks like organization, classification or
content recommendation.
·
Social Network Analysis: Detects
communities or trends in user interactions on social media platforms.
Advantages
·
No need for labeled data: Works
with raw, unlabeled data hence saving time and effort on data annotation.
·
Discovers hidden patterns: Finds
natural groupings and structures that might be missed by humans.
·
Handles complex and large datasets: Effective
for high-dimensional or vast amounts of data.
·
Useful for anomaly detection: Can
identify outliers and unusual data points without prior examples.
Challenges
Here are the key challenges of unsupervised learning:
·
Noisy Data: Outliers and noise can
distort patterns and reduce the effectiveness of algorithms.
·
Overfitting Risk: Overfitting can occur when
models capture noise instead of meaningful patterns in the data.
·
Limited Guidance: The absence of labels
restricts the ability to guide the algorithm toward specific outcomes.
·
Cluster Interpretability: Results
such as clusters may lack clear meaning or alignment with real-world
categories.
📘 MCQ (1–15)
1.
Unsupervised learning uses
a) Labeled data b) Unlabeled data c) Semi data d) Random data
2.
Clustering groups
a) Labels b) Similar data c) Outputs d) Noise
3.
K-Means requires
a) Labels b) K value c) Output d) Rules
4.
PCA is used for
a) Classification b) Reduction c) Clustering d) RL
5.
Apriori is used for
a) Clustering b) Classification c) Association d) Regression
6.
Support measures
a) Probability b) Frequency c) Distance d) Error
7.
Confidence measures
a) Accuracy b) Probability c) Dependency d) Weight
8.
DBSCAN is
a) Partition b) Density c) Tree d) Graph
9.
Clustering is
a) Supervised b) Unsupervised c) RL d) Semi
10. PCA
reduces
a) Noise b) Features c) Labels d) Classes
11. Hierarchical
clustering uses
a) Tree b) Graph c) Rule d) Matrix
12. Lift
measures
a) Accuracy b) Dependency c) Distance d) Loss
13. Unsupervised
learning is
a) Guided b) Self-learning c) Labeled d) Fixed
14. Mean
Shift is
a) Density b) Partition c) Tree d) Rule
15. Apriori
uses
a) Frequent sets b) Labels c) Clusters d) Loss
📘 2 MARK QUESTIONS
(1–10)
1.
Define unsupervised learning.
2.
What is clustering?
3.
Define support.
4.
Define confidence.
5.
What is PCA?
6.
List types of clustering.
7.
What is Apriori algorithm?
8.
Define dimensionality reduction.
9.
What is DBSCAN?
10. List
applications of unsupervised learning.
📘 15 MARK QUESTIONS
(1–4)
1.
Explain unsupervised learning and its types with
examples.
2.
Describe K-Means clustering algorithm with steps and
diagram.
3.
Explain Apriori algorithm with support and confidence.
4.
Discuss dimensionality reduction techniques and
applications.
Comments
Post a Comment