Unsupervised Learning Algorithms

1. Clustering Basics

Clustering is the process of grouping data points such that points in the same group (cluster) are more similar to each other than to those in other groups.

K-Means Clustering

The most popular clustering algorithm. It partitions data into $K$ number of clusters by minimizing the distance between data points and the cluster “centroid.”

Process: Randomly pick $K$ centroids $\rightarrow$ Assign points to nearest centroid $\rightarrow$ Recalculate centroids based on the mean of assigned points $\rightarrow$ Repeat until convergence.
Example: Segmenting customers into 3 groups (Budget, Mid-range, Premium) based on spending habits.

Hierarchical Clustering

Builds a hierarchy of clusters. It is typically visualized using a Dendrogram (a tree-like diagram).

Agglomerative (Bottom-up): Starts with each point as its own cluster and merges the closest pairs until one cluster remains.
Example: Creating a biological “Tree of Life” to show how different species evolved from common ancestors.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Unlike K-Means, DBSCAN groups points that are closely packed together and marks points in low-density regions as outliers.

Key Advantage: It can find clusters of arbitrary shapes (like a “U” shape) and identifies noise automatically.
Example: Identifying clusters of stars in a galaxy where there is significant “empty space” or noise between them.

2. Dimensionality Reduction

These algorithms reduce the number of variables (features) in a dataset while keeping as much important information as possible. This is used to speed up computation and visualize high-dimensional data.

PCA (Principal Component Analysis)

A linear transformation that finds the “Principal Components”—the directions where the variance in the data is maximal.

Use Case: Reducing a dataset with 100 features down to 3 main components that capture 95% of the information.
Example: Compressing an image by reducing the number of pixels while retaining the recognizable features.

LDA (Linear Discriminant Analysis)

While often used for classification, LDA is also a dimensionality reduction technique. Unlike PCA (which is unsupervised), LDA is supervised because it reduces dimensions while maximizing the separation between known classes.

Example: Reducing facial recognition features so that images of “Person A” are as far away as possible from “Person B” in the feature space.

t-SNE (t-Distributed Stochastic Neighbor Embedding)

A non-linear technique mainly used for visualization. It maps high-dimensional data into 2D or 3D space, ensuring that similar points stay close together and dissimilar points stay far apart.

Example: Visualizing thousands of high-dimensional gene expressions on a 2D map to see which cells are similar.

Comparison Table: PCA vs. t-SNE

Feature	PCA	t-SNE
Type	Linear	Non-Linear
Primary Goal	Preserving global structure/variance.	Preserving local structure (clusters).
Speed	Very Fast	Slower (computationally intensive).
Output	Deterministic (same result every time).	Stochastic (results may vary slightly).

Unsupervised Learning Algorithms

Unsupervised learning deals with unlabeled data.
The goal is not prediction, but discovering hidden structures, patterns, and relationships inside the data.

Unlike supervised learning, there is no correct answer provided.
The algorithm must figure out the structure on its own.

1. Clustering Basics

What Is Clustering?

Clustering is the task of grouping similar data points together such that:

Points in the same group (cluster) are more similar to each other
Points in different groups are less similar

Key Idea

Similarity is usually measured using distance metrics:

Euclidean distance
Manhattan distance
Cosine similarity

Why Clustering Is Important

Customer segmentation
Image segmentation
Document grouping
Anomaly detection
Market research

Real-World Example

An e-commerce company clusters customers based on:

Purchase history
Frequency
Spending behavior

No labels like “Premium” or “Regular” are given.
The algorithm discovers these segments automatically.

2. K-Means Clustering

What Is K-Means?

K-Means is a partition-based clustering algorithm that divides data into K clusters, where K is predefined.

How K-Means Works (Step-by-Step)

Choose number of clusters K
Randomly initialize K centroids
Assign each data point to the nearest centroid
Recalculate centroids as mean of assigned points
Repeat until centroids stop changing

Why It’s Called “K-Means”

K → number of clusters
Means → average of points in a cluster

Example

Clustering students based on:

Study hours
Exam scores

K = 3 clusters:

High performers
Average performers
Low performers

Advantages

Simple and fast
Works well on large datasets
Easy to interpret

Limitations

Must choose K beforehand
Sensitive to outliers
Works poorly for non-spherical clusters

Use Cases

Customer segmentation
Image compression
Market analysis

3. Hierarchical Clustering

What Is Hierarchical Clustering?

Hierarchical clustering builds a tree-like structure (dendrogram) showing how clusters are formed at different levels.

Types of Hierarchical Clustering

1. Agglomerative (Bottom-Up)

Start with each point as its own cluster
Merge closest clusters step by step

2. Divisive (Top-Down)

Start with one cluster
Split recursively

Agglomerative is more common.

Linkage Methods

Defines how distance between clusters is calculated:

Single linkage (min distance)
Complete linkage (max distance)
Average linkage
Ward’s method

Example

Grouping documents by topic:

Sports
Politics
Technology

The dendrogram shows topic similarity at different levels.

Advantages

No need to specify number of clusters initially
Dendrogram gives insight into structure

Limitations

Computationally expensive
Not suitable for very large datasets

Use Cases

Biology (gene clustering)
Document classification
Social network analysis

4. DBSCAN (Density-Based Spatial Clustering)

What Is DBSCAN?

DBSCAN clusters points based on density, not distance.

It groups together points that are closely packed and marks sparse points as noise.

Key Concepts

Epsilon (ε) → neighborhood radius
MinPts → minimum points needed to form dense region

Types of Points

Core points
Border points
Noise points (outliers)

How DBSCAN Works

Identify core points
Expand clusters from core points
Assign border points
Mark outliers as noise

Example

Geographic data clustering:

High-density city areas
Sparse rural regions

Advantages

Detects arbitrary-shaped clusters
Handles outliers well
No need to specify number of clusters

Limitations

Sensitive to ε parameter
Struggles with varying densities

Use Cases

Fraud detection
GPS location clustering
Anomaly detection

5. Dimensionality Reduction

What Is Dimensionality Reduction?

Dimensionality reduction reduces the number of features while preserving important information.

High-dimensional data causes:

Curse of dimensionality
Increased computation
Overfitting

Why Dimensionality Reduction Is Needed

Faster training
Better visualization
Noise reduction
Improved model performance

6. Principal Component Analysis (PCA)

What Is PCA?

PCA is a linear dimensionality reduction technique that transforms data into orthogonal components capturing maximum variance.

How PCA Works (Intuition)

Find directions with maximum variance
Project data onto those directions
Keep top components

Example

Dataset with 100 features reduced to 10 components while retaining 95% variance.

Properties

Components are orthogonal
Unsupervised
Based on variance

Advantages

Reduces noise
Improves speed
Helps visualization

Limitations

Linear method
Loses interpretability

Use Cases

Preprocessing for ML models
Image compression
Visualization

7. Linear Discriminant Analysis (LDA)

What Is LDA?

LDA is a supervised dimensionality reduction technique that maximizes class separability.

Note: Often taught alongside unsupervised methods for comparison.

How LDA Works

Maximizes between-class variance
Minimizes within-class variance

Example

Face recognition:

Project faces into space where classes are separable

Difference Between PCA and LDA

PCA	LDA
Unsupervised	Supervised
Maximizes variance	Maximizes class separation
Ignores labels	Uses labels

8. t-SNE (t-Distributed Stochastic Neighbor Embedding)

What Is t-SNE?

t-SNE is a non-linear dimensionality reduction technique mainly used for visualization.

Key Idea

Preserves local structure
Nearby points stay nearby

Example

Visualizing high-dimensional word embeddings in 2D.

Advantages

Excellent visualization
Reveals hidden clusters

Limitations

Computationally expensive
Not suitable for training models
Results vary with parameters

Use Cases

Embedding visualization
Data exploration
Cluster analysis

Summary Table

Algorithm	Type	Strength
K-Means	Clustering	Fast, simple
Hierarchical	Clustering	Interpretability
DBSCAN	Clustering	Noise handling
PCA	Dimensionality reduction	Speed, compression
LDA	Dimensionality reduction	Class separation
t-SNE	Dimensionality reduction	Visualization

Log In

Sign Up

Unsupervised Learning Algorithms

1. Clustering Basics

K-Means Clustering

Hierarchical Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

2. Dimensionality Reduction

PCA (Principal Component Analysis)

LDA (Linear Discriminant Analysis)

t-SNE (t-Distributed Stochastic Neighbor Embedding)

Comparison Table: PCA vs. t-SNE

Unsupervised Learning Algorithms

1. Clustering Basics

What Is Clustering?

Key Idea

Why Clustering Is Important

Real-World Example

2. K-Means Clustering

What Is K-Means?

How K-Means Works (Step-by-Step)

Why It’s Called “K-Means”

Example

Advantages

Limitations

Use Cases

3. Hierarchical Clustering

What Is Hierarchical Clustering?

Types of Hierarchical Clustering

1. Agglomerative (Bottom-Up)

2. Divisive (Top-Down)

Linkage Methods

Example

Advantages

Limitations

Use Cases

4. DBSCAN (Density-Based Spatial Clustering)

What Is DBSCAN?

Key Concepts

Types of Points

How DBSCAN Works

Example

Advantages

Limitations

Use Cases

5. Dimensionality Reduction

What Is Dimensionality Reduction?

Why Dimensionality Reduction Is Needed

6. Principal Component Analysis (PCA)

What Is PCA?

How PCA Works (Intuition)

Example

Properties

Advantages

Limitations

Use Cases

7. Linear Discriminant Analysis (LDA)

What Is LDA?

How LDA Works

Example

Difference Between PCA and LDA

8. t-SNE (t-Distributed Stochastic Neighbor Embedding)

What Is t-SNE?

Key Idea

Example

Advantages

Limitations

Use Cases

Summary Table

Leave a Comment