XBC603A UNIT II SUPERVISED LEARNING

February 23, 2026

1. What is Supervised Learning? (Deep Conceptual View)

Formal Definition

Supervised Learning is the task of learning a function:

From labeled dataset:

Learning Objective

We do not directly learn the true function.
Instead, we estimate:

Such that expected error is minimized.

Risk Minimization Framework

True Risk (Expected Loss):

Since we don't know true distribution, we use:

Empirical Risk:

All supervised algorithms minimize some loss function.

2. Linear Regression (Deep View)

2.1 Problem Setup

Goal: Predict continuous output.

Model:

y=w0+w1x

Vector form:

2.2 Loss Function (Mean Squared Error)

Why squared?

Penalizes large errors more

Differentiable

Convex function

2.3 Closed Form Solution (Normal Equation)

Teaching Insight:

This is derived by setting gradient to zero:

2.4 Geometric Interpretation

Linear regression finds a hyperplane that minimizes perpendicular squared distance from data points.

2.5 Gradient Descent View

Update rule:

w=w−α∂J/∂W

Where:

α = learning rate

This is used when dataset is large.

2.6 Assumptions

Linearity

Independence

Homoscedasticity

No multicollinearity

Violation leads to biased estimates.

2.7 Implementation

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X,y)

y_pred = model.predict(X)

2.8 Teaching-Level Discussion

Ask students:

What happens if features are correlated?

What if relationship is non-linear?

What if outliers exist?

This leads to:

Regularization

Polynomial regression

Robust regression

3. Logistic Regression (Deep View)

3.1 Why Not Linear Regression for Classification?

Because:

Output must be between 0 and 1

Linear model produces unbounded output

3.2 Logistic Model

P(Y=1∣X)=1+e−z1

Where:

$z = w_0 + w_1x$

3.3 Log-Odds Interpretation

$Logistic regression models log-odds linearly.Teaching point: This is why it is called a linear classifier.$

3.4 Loss Function (Cross Entropy)

$Why not MSE?MSE makes loss non-convexCross entropy gives convex
optimization$

3.5 Optimization

$No closed form solution.
Use:Gradient descentNewton’s method$

3.6 Decision Boundary

$This separates classes.$

3.7 Implementation

from sklearn.linear_model import LogisticRegressionmodel = LogisticRegression()model.fit(X,y)

Teaching-Level Depth

$Discuss:Why sigmoid?Why maximum likelihood?How class imbalance affects
results?Threshold tuning impact$

4. Naïve Bayes (Deep View)

4.1 Bayesian Foundation

$Bayes Theorem:$

4.2 Naïve Assumption

$Features are conditionally independent:This simplifies computation dramatically.$

4.3 Types

$Gaussian NB (continuous data)Multinomial NB (text
classification)Bernoulli NB (binary features)$

4.4 Why It Works Despite Wrong Assumption?

$Because classification depends on relative probability.Even if independence assumption is false,
decision boundary may still be correct.$

4.5 Implementation

from sklearn.naive_bayes import GaussianNBmodel = GaussianNB()model.fit(X,y)

5. Bias–Variance Tradeoff

5.1 Total Error

5.2 Bias

$Model too simple → underfittingExample:Using linear model for
polynomial data$

5.3 Variance

$Model too complex → overfittingExample:High-degree polynomial$

5.4 Teaching Strategy

$Plot:Underfitting curveOptimal curveOverfitting curveAsk students to identify behavior.$

6. Model Evaluation (Deep View)

6.1 Regression Metrics

$MSERMSER² ScoreR²:$

6.2 Classification Metrics

$From confusion matrix:AccuracyPrecisionRecallF1 Score$

6.3 When Accuracy Fails

$In imbalanced datasets:Example:
Fraud detection (99% non-fraud)Model predicting always non-fraud → 99% accuracy but
useless.Teaching focus:
Use precision-recall curve.$

7. Complete Supervised Learning Pipeline (Teaching Level)

$Data collectionCleaningFeature engineeringTrain-test splitModel selectionHyperparameter tuningEvaluationDeployment$

8. Student Implementation Roadmap

Step 1: Start with Linear Regression

$·Understand loss·Implement from scratch$

Step 2: Implement Gradient Descent manually

Step 3: Move to Logistic Regression

Step 4: Build Naïve Bayes text classifier

9. From Student Level → Teaching Level Progression

$Student Focus Teaching Focus Formula application Derivation Coding using library Mathematical intuition Accuracy Bias-variance Solve problems Design model$

Student Focus	Teaching Focus
Formula application	Derivation
Coding using library	Mathematical intuition
Accuracy	Bias-variance
Solve problems	Design model

Conceptual Questions for Deep Understanding

$Why is MSE convex?Why logistic regression uses
log-likelihood?Why Naïve Bayes performs well on
text?What happens when learning rate
is too high?Why regularization reduces
variance?$

XBC603A UNIT II SUPERVISED LEARNING

1. What is Supervised Learning? (Deep Conceptual View)

Formal Definition

Learning Objective

Risk Minimization Framework

2. Linear Regression (Deep View)

2.1 Problem Setup

2.2 Loss Function (Mean Squared Error)

2.3 Closed Form Solution (Normal Equation)

Teaching Insight:

2.4 Geometric Interpretation

2.5 Gradient Descent View

2.6 Assumptions

2.7 Implementation

2.8 Teaching-Level Discussion

3. Logistic Regression (Deep View)

3.1 Why Not Linear Regression for Classification?

3.2 Logistic Model

z=w0+w1xz = w_0 + w_1xz=w0​+w1​x

3.3 Log-Odds Interpretation

Logistic regression models log-odds linearly. Teaching point: This is why it is called a linear classifier.

3.4 Loss Function (Cross Entropy)

Why not MSE? MSE makes loss non-convex Cross entropy gives convex optimization

3.5 Optimization

No closed form solution. Use: Gradient descent Newton’s method

3.6 Decision Boundary

This separates classes.

3.7 Implementation

from sklearn.linear_model import LogisticRegressionmodel = LogisticRegression()model.fit(X,y)

Teaching-Level Depth

Discuss: Why sigmoid? Why maximum likelihood? How class imbalance affects results? Threshold tuning impact

4. Naïve Bayes (Deep View)

4.1 Bayesian Foundation

Bayes Theorem:

4.2 Naïve Assumption

Features are conditionally independent: This simplifies computation dramatically.

4.3 Types

Gaussian NB (continuous data) Multinomial NB (text classification) Bernoulli NB (binary features)

4.4 Why It Works Despite Wrong Assumption?

Because classification depends on relative probability. Even if independence assumption is false, decision boundary may still be correct.

4.5 Implementation

from sklearn.naive_bayes import GaussianNBmodel = GaussianNB()model.fit(X,y)

5. Bias–Variance Tradeoff

5.1 Total Error

5.2 Bias

Model too simple → underfitting Example: Using linear model for polynomial data

5.3 Variance

Model too complex → overfitting Example: High-degree polynomial

5.4 Teaching Strategy

Plot: Underfitting curve Optimal curve Overfitting curve Ask students to identify behavior.

6. Model Evaluation (Deep View)

6.1 Regression Metrics

MSE RMSE R² Score R²:

6.2 Classification Metrics

From confusion matrix: Accuracy Precision Recall F1 Score

6.3 When Accuracy Fails

In imbalanced datasets: Example: Fraud detection (99% non-fraud) Model predicting always non-fraud → 99% accuracy but useless. Teaching focus: Use precision-recall curve.

7. Complete Supervised Learning Pipeline (Teaching Level)

Data collection Cleaning Feature engineering Train-test split Model selection Hyperparameter tuning Evaluation Deployment

8. Student Implementation Roadmap

Step 1: Start with Linear Regression

· Understand loss · Implement from scratch

Step 2: Implement Gradient Descent manually

Step 3: Move to Logistic Regression

Step 4: Build Naïve Bayes text classifier

9. From Student Level → Teaching Level Progression

Student Focus Teaching Focus Formula application Derivation Coding using library Mathematical intuition Accuracy Bias-variance Solve problems Design model

Conceptual Questions for Deep Understanding

Why is MSE convex? Why logistic regression uses log-likelihood? Why Naïve Bayes performs well on text? What happens when learning rate is too high? Why regularization reduces variance?

Summary

Supervised learning is fundamentally about: Defining hypothesis space Choosing loss function Optimizing parameters Generalizing well

Comments

Post a Comment

Popular posts from this blog

Unit 2 Data Link Layer - Functions and its Prototocols

Computer Networks

UNIT I INTRODUCTION TO DEEP LEARNING