XBC603A UNIT II SUPERVISED LEARNING

  

1.  What is Supervised Learning? (Deep Conceptual View)

Formal Definition

Supervised Learning is the task of learning a function:

From labeled dataset:

 


 

 

Learning Objective

We do not directly learn the true function.
Instead, we estimate:

 

 

Such that expected error is minimized.

Risk Minimization Framework

True Risk (Expected Loss):

Since we don't know true distribution, we use:

Empirical Risk:


All supervised algorithms minimize some loss function.

2. Linear Regression (Deep View)

2.1 Problem Setup

Goal: Predict continuous output.

Model:

y=w0+w1x

Vector form:

2.2 Loss Function (Mean Squared Error)


Why squared?

Penalizes large errors more

Differentiable

Convex function

2.3 Closed Form Solution (Normal Equation)



Teaching Insight:

This is derived by setting gradient to zero:



2.4 Geometric Interpretation

Linear regression finds a hyperplane that minimizes perpendicular squared distance from data points.

2.5 Gradient Descent View

Update rule:

w=wα∂J/∂W

Where:

α = learning rate

This is used when dataset is large.

2.6 Assumptions

Linearity

Independence

Homoscedasticity

No multicollinearity

Violation leads to biased estimates.

2.7 Implementation

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X,y)
y_pred = model.predict(X)

2.8 Teaching-Level Discussion

Ask students:

What happens if features are correlated?

What if relationship is non-linear?

What if outliers exist?

This leads to:

Regularization

Polynomial regression

Robust regression

3. Logistic Regression (Deep View)

3.1 Why Not Linear Regression for Classification?

Because:

Output must be between 0 and 1

Linear model produces unbounded output

3.2 Logistic Model

P(Y=1X)=1+e−z1

Where:

z=w0+w1xz = w_0 + w_1xz=w0+w1x

 

3.3 Log-Odds Interpretation


Logistic regression models log-odds linearly.

Teaching point: This is why it is called a linear classifier.

3.4 Loss Function (Cross Entropy)


Why not MSE?

MSE makes loss non-convex

Cross entropy gives convex optimization

3.5 Optimization

No closed form solution.
Use:

Gradient descent

Newton’s method

3.6 Decision Boundary


This separates classes.

3.7 Implementation

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X,y)

Teaching-Level Depth

Discuss:

Why sigmoid?

Why maximum likelihood?

How class imbalance affects results?

Threshold tuning impact

4. Naïve Bayes (Deep View)

4.1 Bayesian Foundation

Bayes Theorem:


4.2 Naïve Assumption

Features are conditionally independent:


This simplifies computation dramatically.

4.3 Types

Gaussian NB (continuous data)

Multinomial NB (text classification)

Bernoulli NB (binary features)

4.4 Why It Works Despite Wrong Assumption?

Because classification depends on relative probability.

Even if independence assumption is false,
decision boundary may still be correct.

4.5 Implementation

from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X,y)

 

5. Bias–Variance Tradeoff

 

5.1 Total Error


5.2 Bias

Model too simple → underfitting

Example:

Using linear model for polynomial data

5.3 Variance

Model too complex → overfitting

Example:

High-degree polynomial

5.4 Teaching Strategy

Plot:

Underfitting curve

Optimal curve

Overfitting curve

Ask students to identify behavior.

 

6. Model Evaluation (Deep View)

6.1 Regression Metrics

MSE

RMSE

R² Score

R²:



6.2 Classification Metrics

From confusion matrix:

Accuracy

Precision

Recall

F1 Score

6.3 When Accuracy Fails

In imbalanced datasets:

Example:
Fraud detection (99% non-fraud)

Model predicting always non-fraud → 99% accuracy but useless.

Teaching focus:
Use precision-recall curve.

 

7. Complete Supervised Learning Pipeline (Teaching Level)

Data collection

Cleaning

Feature engineering

Train-test split

Model selection

Hyperparameter tuning

Evaluation

Deployment

 

8.  Student Implementation Roadmap

Step 1: Start with Linear Regression

·         Understand loss

·         Implement from scratch

Step 2: Implement Gradient Descent manually

Step 3: Move to Logistic Regression

Step 4: Build Naïve Bayes text classifier

 

9. From Student Level → Teaching Level Progression

Student Focus

Teaching Focus

Formula application

Derivation

Coding using library

Mathematical intuition

Accuracy

Bias-variance

Solve problems

Design model

 

Conceptual Questions for Deep Understanding

Why is MSE convex?

Why logistic regression uses log-likelihood?

Why Naïve Bayes performs well on text?

What happens when learning rate is too high?

Why regularization reduces variance?

Summary

Supervised learning is fundamentally about:

Defining hypothesis space

Choosing loss function

Optimizing parameters

Generalizing well

Comments

Popular posts from this blog

Unit 2 Data Link Layer - Functions and its Prototocols

Computer Networks

UNIT I INTRODUCTION TO DEEP LEARNING