Unit 1 – Introduction to Machine Learning.

What is machine learning?

ML is the subset of AI, it is the study and development of efficient statistical algorithms for machine to learn like human learning system. Make the machine to learn from past data , analyse it ,identify the pattern and predict some inventions

Machine Learning is making the computer learn from studying data and statistics.

Machine Learning is a step into the direction of artificial intelligence (AI).

Machine Learning is a program that analyses data and learns to predict the outcome.

Key elements of Machine Learning

Association Rule in Machine Learning -

If A then B

Association Rule Learning is an important concept of Machine Learning,

It is a type of Unsupervised Learning. It is used in marketing system, web usage mining, continuous production etc for example a tea shop with bakery items, a man buying chicken along with chicken masala packet.

If A à antecedent

è Relationship à Cardinality

Then B à consequent

If the number of items increases, then cardinality also increases accordingly. So, to measure the associations between thousands of data items, there are several metrics. These metrics are given below. Metrics which measures Cardinality is Support, Confidence and Lift

Support

Support is the frequency of X(dataset) based on Transaction T

Confidence

Confidence indicates how often the rule has been found to be true. The items X and Y occur together in the dataset when the occurrence of X is already given. The ratio of the transaction that contains X and Y to the number of records that contain X.

Lift

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are independent of each other. It has three possible values:

If Lift= 1: The probability of occurrence of antecedent and consequent is independent of each other.
Lift>1: It determines the degree to which the two item sets are dependent to each other.
Lift<1: It tells us that one item is a substitute for other items, which means one item has a negative effect on another.

Algorithms

1. Apriori

This algorithm uses frequent datasets to generate association rules. It is designed to work on the databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to calculate the item set efficiently. It is mainly used for market basket analysis and helps to understand the products that can be bought together. It can also be used in the healthcare field to find drug reactions for patients.

2. Eclat

The ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a data mining algorithm for association rule mining designed to solve customer bucket analysis problems. The goal is to understand which products from the bucket are commonly bought together.

There are two ways to organize data in relational databases:

Row-oriented – the traditional way of storing data that stores data records in rows and splits it by one or several column

Column-oriented (also known as columnar or C-store) – stores data by field, keeping all of the data associated with a field next to each other

Frequent-Pattern growth

FP-growth algorithm is an improved version of the Apriori algorithm used for Association Rule Mining from the database. The Apriori algorithms have two significant drawbacks: speed and high computational cost. To overcome these drawbacks, you can use a much faster FP-growth algorithm. It reduces the redundant(repeated) steps to increase speed and reduce cost.

Classification,

Classification is supervised learning system in machine learning. To observe and group or class the new observations based on the training data set. For example from large amount of mails categorizing spam mails and authorized mails, given image is a goat or cow etc.

y=f(x), where y is a category

Binary Classifier: If the classification has only two possible outcomes, then it is called as Binary Classifier. Ex if .. else
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification has more than two outcomes, then it is called as Multi-class Classifier. Ex select case
Example: Classifications of types of crops, Classification of types of music, classification of insect bites, classification of diseases spread by water/mosquitoes/animal bytes etc

Types of Learners(alogorithms)

Lazy Learners - K-NN algorithm, Case-based reasoning

Lazy Learner first, store the training dataset and wait until for the test dataset. Then the learner case, classification is done on the basis of the most related data stored in the training dataset. It takes less time in training but more time for predictions.

Eager Learners - Decision Trees, Naïve Bayes, ANN.

Eager Learners develop a “classification model” based on a training dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in prediction.

Types of Classification Algorithm Models:

Linear Models

Logistic Regression
Support Vector Machines

Non-linear Models

K-Nearest Neighbours
Kernel SVM
Naïve Bayes
Decision Tree Classification
Random Forest Classification

Evaluation Techniques of Classification Models

1. Log Loss

2. Confusion Matrix

3. AUC – ROC curve

Uses of Classification Algorithms:-

Email Spam Detection
Speech Recognition
Identifications of Cancer tumor cells.
Drugs Classification
Biometric Identification, etc.

Regression

Regression is the statistical method to find the relationship between dependent (target) and independent (predictor) variables and build a common model. How the dependent variable changes based on independent variable change eg Temperature table, age, salary, price, dress size, lab reports etc

Terms Related to the Regression:

Dependent Variable: or target variable– is the main factor in Regression
Independent Variable: or predictor. - The factors which affect range of the dependent variables
Outliers: Outlier is a range of observation which contains Min and Max value.
Multi co-linearity: If the independent variables are highly correlated with each other than other variables, then such condition is called Multi co-llinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting variable.
Under-fitting and Over-fitting: If our algorithm works well with the training dataset but not well with test dataset, then such problem is called Over-fitting. And if our algorithm does not perform well even with training dataset, then such problem is called under-fitting.

Types of Regression

Linear Regression
Logistic Regression
Polynomial Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
Ridge Regression
Lasso Regression

Linear Regression:

Linear regression is a statistical regression method which is used for predictive analysis.
relationship between the independent variable (X-axis) and the dependent variable (Y-axis),
If there is only one input variable (x), then such linear regression is called simple linear regression. And if there is more than one input variable, then such linear regression is called multiple linear regression.
The relationship between variables in the linear regression model can be explained using the below image. Here we are predicting the salary of an employee on the basis of the year of experience.

Below is the mathematical equation for Linear regression:

Y= aX+b

Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

Analyzing trends and sales estimates
Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.

Logistic Regression:

Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1.
Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam or not spam, etc.
It is a predictive analysis algorithm which works on the concept of probability.
Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term how they are used.
Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used to model the data in logistic regression. The function can be
represented as:

f(x)= Output between the 0 and 1 value.
x= input to the function
e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

Binary(0/1, pass/fail)
Multi(cats, dogs, lions)
Ordinal(low, medium, high)

Polynomial Regression:

Polynomial Regression is a type of regression which models the non-linear dataset using a linear model.
It is similar to multiple linear regression, but it fits a non-linear curve between the value of x and corresponding conditional values of y.
Suppose there is a dataset which consists of datapoints which are present in a non-linear fashion, so for such case, linear regression will not best fit to those datapoints. To cover such datapoints, we need Polynomial regression.
In Polynomial regression, the original features are transformed into polynomial features of given degree and then modeled using a linear model. Which means the datapoints are best fitted using a polynomial line.

The equation for polynomial regression also derived from linear regression equation that means Linear regression equation Y= b₀+ b₁x, is transformed into Polynomial regression equation Y= b₀+b₁x+ b₂x²+ b₃x³+.....+ b_nxⁿ.
Here Y is the predicted/target output, b₀, b₁,... b_n are the regression coefficients. x is our independent/input variable.
The model is still linear as the coefficients are still linear with quadratic

Note: This is different from Multiple Linear regression in such a way that in Polynomial regression, a single element has different degrees instead of multiple variables with the same degree.

Support Vector Regression:

Support Vector Machine is a supervised learning algorithm which can be used for regression as well as classification problems. So if we use it for regression problems, then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous variables. Below are some keywords which are used in Support Vector Regression:

Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line which helps to predict the continuous variables and cover most of the datapoints.
Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin for datapoints.
Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum number of datapoints are covered in that margin. The main goal of SVR is to consider the maximum datapoints within the boundary lines and the hyperplane (best-fit line) must contain a maximum number of datapoints. Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are known as boundary lines.

Decision Tree Regression:

Decision Tree is a supervised learning algorithm which can be used for solving both classification and regression problems.
It can solve problems for both categorical and numerical data
Decision Tree regression builds a tree-like structure in which each internal node represents the "test" for an attribute, each branch represent the result of the test, and each leaf node represents the final decision or result.
A decision tree is constructed starting from the root node/parent node (dataset), which splits into left and right child nodes (subsets of dataset). These child nodes are further divided into their children node, and themselves become the parent node of those nodes. Consider the below image:

Above image showing the example of Decision Tee regression, here, the model is trying to predict the choice of a person between Sports cars or Luxury car.

Random forest is one of the most powerful supervised learning algorithms which is capable of performing regression as well as classification tasks.
The Random Forest regression is an ensemble learning method which combines multiple decision trees and predicts the final output based on the average of each tree output. The combined decision trees are called as base models, and it can be represented more formally as:

g(x)= f₀(x)+ f₁(x)+ f₂(x)+....

Random forest uses Bagging or Bootstrap Aggregation technique of ensemble learning in which aggregated decision tree runs in parallel and do not interact with each other.
With the help of Random Forest regression, we can prevent Overfitting in the model by creating random subsets of the dataset.

Ridge Regression:

Ridge regression is one of the most robust versions of linear regression in which a small amount of bias is introduced so that we can get better long term predictions.
The amount of bias added to the model is known as Ridge Regression penalty. We can compute this penalty term by multiplying with the lambda to the squared weight of each individual features.
The equation for ridge regression will be:

A general linear or polynomial regression will fail if there is high co-linearity between the independent variables, so to solve such problems, Ridge regression can be used.
Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is also called as L2 regularization.
It helps to solve the problems if we have more parameters than samples.

Lasso Regression:

Lasso regression is another regularization technique to reduce the complexity of the model.
It is similar to the Ridge Regression except that penalty term contains only the absolute weights instead of a square of weights.
Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near to 0.
It is also called as L1 regularization. The equation for Lasso regression will be:

1. Applications of Machine Learning

Datamining,

Fraud detection,

spam filtering,

Astronomy

etc…

Supervised Learning

It is defined by use of labeled datasets to train algorithms that to classify data or predict outcomes accurately.

Unsupervised Learning

Algorithms are given unlabeled data and allowed to discover patterns and insights without any explicit guidance or instruction.

Statistical Learning

Statistical learning is a powerful tool for data scientists to analyze and make predictions based on available data. To identify patterns and relationships in data

Bayesian Method - is used to calculate probability of the classes.

Bayes' Theorem states that the conditional probability of an event, based on the occurrence of another event, is equal to the likelihood of the second event given the first event multiplied by the probability of the first event.

The Naive Bayes Classifier -

Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems.

Important Questions

One marks

1. ML algorithm

2. Classification algorithm

3. Reinforcement Learning

4. Apriori algorithm

5. Regression

2 marks

1. Machine Learning

2. Key elements of ML

3. Regression in ML

4. Keywords of SVM

5. Bayes Theorem

15 marks

1. Machine Learning and its Key elements

2. Supervised Learning System

3. Unsupervised learning system’

4. Reinforcement Learning System

Search This Blog

MyTutorials4U