PART B ▪ UNIT 6 · Machine Learning Algorithms

  root@vm-learning
  ~
  $
  open
  ch-b6
  

PART B ▪ UNIT 6

Machine Learning Algorithms

Supervised · Unsupervised · Regression · Classification · Clustering

Machine Learning (ML) is the branch of AI that teaches computers to learn from data and make decisions without being explicitly programmed. Instead of hard-coded rules, ML algorithms identify patterns and relationships in data and use them to generalise to new, unseen data. You already use ML every day — online shopping recommendations, Netflix movie suggestions, facial recognition, chatbots, voice assistants, fraud detection, and self-driving cars are all powered by ML.

Learning Outcome: Differentiate types of ML · Understand the concept behind each method · Apply methods to day-to-day problems · Build knowledge for Capstone Project

1.1 Machine Learning in a Nutshell

ML algorithms learn from various types of data — images, text, sensor readings, historical records.
Common algorithms: decision trees, neural networks, support vector machines.
Wide applications: Netflix recommendations, speech recognition, medical diagnosis, autonomous vehicles, chatbots, personalised ads, fraud detection.
Challenges: overfitting (too specialised on training data), bias in training data, lack of interpretability (black boxes).

1.2 Three Types of Machine Learning

Supervised Learning

Learns from labelled data — each input has the correct output.
Goal: learn input → output mapping for predictions.
Examples: linear regression, logistic regression, decision trees, SVM, neural networks.

Unsupervised Learning

Learns from unlabelled data — no correct output given.
Goal: discover hidden patterns, clusters, or associations.
Examples: k-means clustering, hierarchical clustering, PCA, autoencoders.

Reinforcement Learning

An agent learns by interacting with an environment via rewards and penalties.
Goal: learn a policy that maximises cumulative reward over time.
Examples: Q-learning, deep Q-networks, policy gradients, actor-critic methods.

1.3 Supervised Learning — Two Types

Supervised learning splits into two families depending on what kind of output you are predicting:

Regression

Works with continuous data — numbers that can take any value in a range.
Example: predict a person's salary based on years of experience.

Classification

Works with discrete data — distinct categories or labels.
Example: classify email as spam or not-spam.

PART A — REGRESSION

2.1 Correlation — Foundation of Regression

Correlation is a measure of the strength of a linear relationship between two quantitative variables (e.g., price, sales). If two variables change together, they are said to be correlated.

Three Types of Correlation

↗Positive Correlation
Both variables move in the same direction — as one increases, the other also increases.

↘Negative Correlation
Variables move in opposite directions — as one increases, the other decreases.

→Zero Correlation
No apparent relationship — changes in one variable do not predict changes in the other.

Pearson's Correlation Coefficient (r)

r = [n · Σxy − (Σx)(Σy)] / √{[n·Σx² − (Σx)²] · [n·Σy² − (Σy)²]}

Range of r: −1 to +1
• +1 = perfect positive correlation
• 0 = no correlation
• −1 = perfect negative correlation

Requirements for using Pearson's r

Scale of measurement must be interval or ratio.
Variables should be approximately normally distributed.
The association must be linear.
There should be no outliers in the data.

Example: 6 people with different ages (x) and weights (y). After computing sums: r = (6 × 13937 − 202 × 409) / √{[6 × 7280 − 202²] × [6 × 28365 − 409²]} = 1004 / 2892.45 = 0.35 — a weak positive correlation.

Practical (Syllabus): Calculate Pearson's correlation coefficient in MS Excel using the formula =CORREL(range1, range2) — or the raw formula above via helper cells.

2.2 When NOT to Use Regression

No CorrelationIf variables change independently, regression gives no meaningful insight.

Non-linear RelationshipsSimple regression captures only linear trends; curved data needs polynomial / non-linear regression.

OutliersExtreme data points disproportionately pull the regression line and hurt predictions.

Violated AssumptionsIf linearity or non-multicollinearity is violated, results are unreliable.

2.3 Regression — Predicting a Continuous Variable

Regression is a statistical technique that models the relationship between a dependent variable and one or more independent variables, and uses it to predict the dependent variable.

Variables in Regression

Variable	Role
y — Dependent / Explained	The variable we want to predict or understand.
x — Independent / Predictor	Used to predict or explain changes in y.

Linear Regression Equation

y = a + bx + e

a — intercept of the regression line with the y-axis.
b — slope; rate of change in y per unit change in x.
e — error / residual (difference between observed y and predicted y).

2.4 Finding the Best-Fit Line

The least-squares method minimises the squared differences between observed values of y and values predicted by the regression equation. This gives the best-fit line that captures the overall trend of the data.

Properties of the Regression Line

Minimises the sum of squared differences between observed (y) and predicted (ŷ) values.
Passes through the mean of independent and dependent features.

Practical (Syllabus): Demonstrate Linear Regression in MS Excel.

Select Age and Weight columns.
Insert a Scatter chart; add Trendline → Linear; check Display Equation on Chart.
Use =SLOPE(Weight, Age) and =INTERCEPT(Weight, Age) to verify.

2.5 Linear Regression — Two Types

️ Simple Linear Regression

One independent variable predicts the dependent variable.
Example: predict weight from age.

Multiple Linear Regression

Multiple independent variables predict the dependent variable.
Example: predict salary from experience + education + job role.

Applications of Linear Regression

Application	How it's used
Market Analysis	Understand how pricing, sales volume, advertising, social-media engagement relate.
Sales Forecasting	Predict future sales from past sales + marketing spend + seasonal trends.
Salary Prediction	Estimate a person's salary from experience, education, job role.
Sports Analysis	Analyse player/team performance using statistics, conditions, opponent strength.
Medical Research	Link age, weight, health outcomes — identify risk factors and evaluate interventions.

Advantages

Simple and easy to implement.
Efficient to train.

Disadvantages

Sensitive to outliers.
Limited to linear relationships.

Python — Linear Regression (Advanced)

import numpy as np
import matplotlib.pyplot as plt

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

# Calculate means and std devs
x_mean = np.mean(x)
y_mean = np.mean(y)
x_std = np.std(x)

# Covariance and slope
covariance = np.sum((x - x_mean) * (y - y_mean)) / (len(x) - 1)
slope = covariance / (x_std**2)
intercept = y_mean - slope * x_mean

# Predict and plot
y_pred = slope * x + intercept
plt.scatter(x, y)
plt.plot(x, y_pred, color="red")
plt.title("Simple Linear Regression")
plt.show()

print(f"Slope: {slope:.2f}")
print(f"Intercept: {intercept:.2f}")

PART B — CLASSIFICATION

3.1 What is Classification?

Classification assigns labels / categories to data based on their features. The model learns from labelled training data and predicts class labels for new, unseen data — a key form of supervised learning.

Real-world example: A housing society has separate dustbins for paper, plastic, food, metal waste. By sorting waste into these categories, we are classifying. Labels like "paper", "metal", "plastic" are assigned to each piece of waste based on its properties.

3.2 How Classification Works — 5 Steps

1Classes
Divide data into distinct categories (e.g., spam / not-spam).

2Features
Describe each instance by its attributes that help distinguish classes.

3Training Data
Labelled examples used to teach the model — features + correct class.

4Model
Algorithm that learns the pattern between features and class labels.

5Prediction
Use the trained model to predict class labels on new data.

3.3 Four Types of Classification

Type	Description	Example
1. Binary	Exactly two class labels.	Email spam detection · Medical test (cancer yes/no) · Pass / fail.
2. Multi-Class	More than two class labels.	Face classification · Plant species classification · Optical Character Recognition.
3. Multi-Label	Each example can belong to multiple class labels.	Photo classification — objects in the photo: bicycle, apple, person, etc.
4. Imbalanced	Unequal distribution — majority & minority classes.	Fraud detection · Outlier detection · Medical diagnostics.

3.4 K-Nearest Neighbours (k-NN)

k-NN is a non-parametric supervised learning algorithm used for classification and regression. It makes predictions based on the proximity (similarity) of a new data point to existing training points.

6 Steps in k-NN

1Choose K
Select the number of neighbours to consider (e.g., K=3).

2Compute Distance
Calculate Euclidean distance from the new point to each training point.

3Find K Nearest
Select the K points with the smallest distance.

4Count Categories
Among the K neighbours, count how many belong to each class.

5Assign Class
Assign the new point to the class with the highest count (majority vote).

6Model Ready
Classifier is ready for use on new data.

Applications, Pros & Cons of k-NN

Applications	Advantages	Limitations
Image recognition Recommendation systems Healthcare diagnostics Text mining, sentiment analysis Anomaly detection	Easy to implement and understand. No explicit training phase. Works for both classification and regression. Robust to outliers and noisy data.	Computationally expensive for large data. Sensitive to the distance metric and K. Needs careful preprocessing + feature scaling. Struggles with high-dimensional data.

Python — k-NN (Advanced)

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

# Load dataset
data_set = pd.read_csv("user_data.csv")

# Feature extraction
x = data_set.iloc[:, [2, 3]].values
y = data_set.iloc[:, 4].values

# Split train / test
x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.25, random_state=0)

# Feature scaling
st_x = StandardScaler()
x_train = st_x.fit_transform(x_train)
x_test  = st_x.transform(x_test)

# Train and predict
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)

Teachable Machine (teachablemachine.withgoogle.com) — Google's no-code tool to train an image classifier by showing webcam examples. Perfect for introducing classification to beginners.

PART C — UNSUPERVISED LEARNING (CLUSTERING)

4.1 What is Clustering?

Clustering (or cluster analysis) groups unlabelled data into clusters based on similarity. Points inside a cluster are more similar to each other than to those in other clusters. It is a form of unsupervised learning — no predefined labels, no supervision.

Shopping centre analogy: Items are grouped by similarity — apples, bananas, grapes in the fruits section; shirts, trousers, jackets in the men's wear section. Clustering does the same for data points.

4.2 How Clustering Works — 4 Key Steps

1Prepare Data
Select the right features; scale or transform them as needed.

2Similarity Metric
Define how similar two data points are (e.g., Euclidean distance).

3Run Algorithm
Apply a clustering algorithm suited to dataset size and shape.

4Interpret
Analyse the clusters — unsupervised, so interpretation is essential.

4.3 Four Types of Clustering Methods

1. Partitioning (Centroid-based)

Divides data into k non-hierarchical groups. Cluster centres minimise distance within each cluster.
Example: K-Means Clustering.

2. Density-Based

Connects highly dense areas into clusters. Sparse areas separate clusters. Works for arbitrary shapes.
Example: DBSCAN.

3. Distribution Model-Based

Assumes data within each cluster follows a probability distribution (often Gaussian).
Example: Gaussian Mixture Models (GMM).

4. Hierarchical

Builds a tree-like structure (dendrogram). No need to pre-specify the number of clusters — cut the tree at any level.
Example: Agglomerative Hierarchical.

4.4 K-Means Clustering Algorithm

K-Means is the most popular clustering algorithm. It divides the dataset into K clusters of equal variance. You must specify K beforehand.

7 Steps in K-Means

Select K — decide the number of clusters.
Initial centroids — pick K random points (can be from the dataset).
Assign each data point to its closest centroid → forms K clusters.
Re-compute centroids — calculate the new mean (centroid) of each cluster.
Re-assign every data point to the new closest centroid.
Repeat steps 4-5 until no point changes cluster (convergence).
Model ready — the clusters are final.

K-Means — Applications

Market SegmentationGroup customers by purchasing behaviour for targeted marketing.

Image SegmentationPartition images into regions of similar colours for object detection / compression.

Document ClusteringCategorise documents by content similarity for organisation.

Anomaly DetectionIdentify outliers by clustering normal data points.

Customer SegmentationSegment customers for personalised experiences.

Advantages

Easy to implement, suitable for all levels.
Handles large datasets with low compute cost.
Works with many features and data points.
Easy to understand — aids decision-making.
Works across various domains and data types.

Limitations

Results vary based on initial centroid placement.
Assumes clusters are spherical.
Number of clusters (K) must be known beforehand.
Outliers distort centroids.
May converge to sub-optimal solutions.

Python — K-Means (Advanced)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster  import KMeans

# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4,
                  cluster_std=0.60, random_state=0)

# Apply K-means
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

# Plot points coloured by cluster
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap="viridis")

# Mark centroids in red
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c="red", s=200, alpha=0.75)

plt.title("K-means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Visual AI (k-means visualisation tool) — lets you upload data, change parameters and watch clusters update in real-time. Great for grasping how K-Means converges.

5.1 Summary Comparison — Regression vs Classification vs Clustering

Aspect	Regression	Classification	Clustering
Learning Type	Supervised	Supervised	Unsupervised
Output	Continuous number	Discrete class / category	Group / cluster
Labelled data?	Yes	Yes	No
Goal	Predict a value	Assign a category	Find natural groups
Example	Predict house price	Spam / not-spam	Customer segmentation
Typical Algorithm	Linear Regression	k-NN, Logistic Regression	K-Means

5.2 Practical Programs & Certification (Syllabus)

From syllabus:

Calculation of Pearson's correlation coefficient in MS Excel.
Demonstration of Linear Regression in MS Excel (trendline + SLOPE / INTERCEPT functions).
Demonstration of Linear Regression using a Python program (advanced).
Demonstration of k-NN using a Python program (advanced).
Demonstration of k-means clustering using a Python program (advanced).
IBM SkillsBuild — Machine Learning with Python certification.

Quick Revision — Key Points to Remember

Machine Learning = subset of AI where computers learn from data without being explicitly programmed.
Challenges: Overfitting · Bias · Lack of interpretability ("black boxes").
3 ML types: Supervised (labelled data) · Unsupervised (no labels) · Reinforcement (rewards / penalties).
Supervised = 2 families: Regression (continuous output) · Classification (discrete output).
Correlation measures linear relationship between 2 quantitative variables. Range: +1 (perfect positive) → 0 (none) → −1 (perfect negative).
Pearson's r = [n·Σxy − (Σx)(Σy)] / √{[n·Σx²−(Σx)²]·[n·Σy²−(Σy)²]}. Needs interval/ratio data, normal distribution, linear, no outliers.
Linear Regression: y = a + bx + e (a=intercept, b=slope, e=error).
Finding the line: Least-squares method — minimises squared residuals; line passes through mean (x, y).
2 types of Linear Regression: Simple (1 independent var) · Multiple (2+ independent vars).
Classification = assign labels to data. 5 steps: Classes → Features → Training Data → Model → Prediction.
4 types of classification: Binary · Multi-Class · Multi-Label · Imbalanced.
k-NN algorithm = supervised; predicts class via majority vote of K nearest neighbours (Euclidean distance).
Clustering = unsupervised grouping of similar unlabelled data.
4 clustering types: Partitioning (K-Means) · Density-based (DBSCAN) · Distribution (GMM) · Hierarchical (dendrogram).
K-Means 7 steps: Choose K → Random K centroids → Assign points to closest → Recompute centroids → Re-assign → Repeat → Model ready.
K-Means apps: Market & Customer Segmentation · Image Segmentation · Document Clustering · Anomaly Detection.
Certification: IBM SkillsBuild — Machine Learning with Python.

Practice Quiz — test yourself on this chapter→