Log In

Don't have an account? Sign up now

Lost Password?

Sign Up

Prev Next

Unsupervised Learning Algorithms

1. Clustering Basics

Clustering is the process of grouping data points such that points in the same group (cluster) are more similar to each other than to those in other groups.

K-Means Clustering

The most popular clustering algorithm. It partitions data into $K$ number of clusters by minimizing the distance between data points and the cluster “centroid.”

  • Process: Randomly pick $K$ centroids $\rightarrow$ Assign points to nearest centroid $\rightarrow$ Recalculate centroids based on the mean of assigned points $\rightarrow$ Repeat until convergence.
  • Example: Segmenting customers into 3 groups (Budget, Mid-range, Premium) based on spending habits.

Hierarchical Clustering

Builds a hierarchy of clusters. It is typically visualized using a Dendrogram (a tree-like diagram).

  • Agglomerative (Bottom-up): Starts with each point as its own cluster and merges the closest pairs until one cluster remains.
  • Example: Creating a biological “Tree of Life” to show how different species evolved from common ancestors.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Unlike K-Means, DBSCAN groups points that are closely packed together and marks points in low-density regions as outliers.

  • Key Advantage: It can find clusters of arbitrary shapes (like a “U” shape) and identifies noise automatically.
  • Example: Identifying clusters of stars in a galaxy where there is significant “empty space” or noise between them.

2. Dimensionality Reduction

These algorithms reduce the number of variables (features) in a dataset while keeping as much important information as possible. This is used to speed up computation and visualize high-dimensional data.

PCA (Principal Component Analysis)

A linear transformation that finds the “Principal Components”—the directions where the variance in the data is maximal.

  • Use Case: Reducing a dataset with 100 features down to 3 main components that capture 95% of the information.
  • Example: Compressing an image by reducing the number of pixels while retaining the recognizable features.

LDA (Linear Discriminant Analysis)

While often used for classification, LDA is also a dimensionality reduction technique. Unlike PCA (which is unsupervised), LDA is supervised because it reduces dimensions while maximizing the separation between known classes.

  • Example: Reducing facial recognition features so that images of “Person A” are as far away as possible from “Person B” in the feature space.

t-SNE (t-Distributed Stochastic Neighbor Embedding)

A non-linear technique mainly used for visualization. It maps high-dimensional data into 2D or 3D space, ensuring that similar points stay close together and dissimilar points stay far apart.

  • Example: Visualizing thousands of high-dimensional gene expressions on a 2D map to see which cells are similar.

Comparison Table: PCA vs. t-SNE

FeaturePCAt-SNE
TypeLinearNon-Linear
Primary GoalPreserving global structure/variance.Preserving local structure (clusters).
SpeedVery FastSlower (computationally intensive).
OutputDeterministic (same result every time).Stochastic (results may vary slightly).

Unsupervised Learning Algorithms

Unsupervised learning deals with unlabeled data.
The goal is not prediction, but discovering hidden structures, patterns, and relationships inside the data.

Unlike supervised learning, there is no correct answer provided.
The algorithm must figure out the structure on its own.


1. Clustering Basics

What Is Clustering?

Clustering is the task of grouping similar data points together such that:

  • Points in the same group (cluster) are more similar to each other
  • Points in different groups are less similar

Key Idea

Similarity is usually measured using distance metrics:

  • Euclidean distance
  • Manhattan distance
  • Cosine similarity

Why Clustering Is Important

  • Customer segmentation
  • Image segmentation
  • Document grouping
  • Anomaly detection
  • Market research

Real-World Example

An e-commerce company clusters customers based on:

  • Purchase history
  • Frequency
  • Spending behavior

No labels like “Premium” or “Regular” are given.
The algorithm discovers these segments automatically.


2. K-Means Clustering

What Is K-Means?

K-Means is a partition-based clustering algorithm that divides data into K clusters, where K is predefined.


How K-Means Works (Step-by-Step)

  1. Choose number of clusters K
  2. Randomly initialize K centroids
  3. Assign each data point to the nearest centroid
  4. Recalculate centroids as mean of assigned points
  5. Repeat until centroids stop changing

Why It’s Called “K-Means”

  • K → number of clusters
  • Means → average of points in a cluster

Example

Clustering students based on:

  • Study hours
  • Exam scores

K = 3 clusters:

  • High performers
  • Average performers
  • Low performers

Advantages

  • Simple and fast
  • Works well on large datasets
  • Easy to interpret

Limitations

  • Must choose K beforehand
  • Sensitive to outliers
  • Works poorly for non-spherical clusters

Use Cases

  • Customer segmentation
  • Image compression
  • Market analysis

3. Hierarchical Clustering

What Is Hierarchical Clustering?

Hierarchical clustering builds a tree-like structure (dendrogram) showing how clusters are formed at different levels.


Types of Hierarchical Clustering

1. Agglomerative (Bottom-Up)

  • Start with each point as its own cluster
  • Merge closest clusters step by step

2. Divisive (Top-Down)

  • Start with one cluster
  • Split recursively

Agglomerative is more common.


Linkage Methods

Defines how distance between clusters is calculated:

  • Single linkage (min distance)
  • Complete linkage (max distance)
  • Average linkage
  • Ward’s method

Example

Grouping documents by topic:

  • Sports
  • Politics
  • Technology

The dendrogram shows topic similarity at different levels.


Advantages

  • No need to specify number of clusters initially
  • Dendrogram gives insight into structure

Limitations

  • Computationally expensive
  • Not suitable for very large datasets

Use Cases

  • Biology (gene clustering)
  • Document classification
  • Social network analysis

4. DBSCAN (Density-Based Spatial Clustering)

What Is DBSCAN?

DBSCAN clusters points based on density, not distance.

It groups together points that are closely packed and marks sparse points as noise.


Key Concepts

  • Epsilon (ε) → neighborhood radius
  • MinPts → minimum points needed to form dense region

Types of Points

  • Core points
  • Border points
  • Noise points (outliers)

How DBSCAN Works

  1. Identify core points
  2. Expand clusters from core points
  3. Assign border points
  4. Mark outliers as noise

Example

Geographic data clustering:

  • High-density city areas
  • Sparse rural regions

Advantages

  • Detects arbitrary-shaped clusters
  • Handles outliers well
  • No need to specify number of clusters

Limitations

  • Sensitive to ε parameter
  • Struggles with varying densities

Use Cases

  • Fraud detection
  • GPS location clustering
  • Anomaly detection

5. Dimensionality Reduction

What Is Dimensionality Reduction?

Dimensionality reduction reduces the number of features while preserving important information.

High-dimensional data causes:

  • Curse of dimensionality
  • Increased computation
  • Overfitting

Why Dimensionality Reduction Is Needed

  • Faster training
  • Better visualization
  • Noise reduction
  • Improved model performance

6. Principal Component Analysis (PCA)

What Is PCA?

PCA is a linear dimensionality reduction technique that transforms data into orthogonal components capturing maximum variance.


How PCA Works (Intuition)

  1. Find directions with maximum variance
  2. Project data onto those directions
  3. Keep top components

Example

Dataset with 100 features reduced to 10 components while retaining 95% variance.


Properties

  • Components are orthogonal
  • Unsupervised
  • Based on variance

Advantages

  • Reduces noise
  • Improves speed
  • Helps visualization

Limitations

  • Linear method
  • Loses interpretability

Use Cases

  • Preprocessing for ML models
  • Image compression
  • Visualization

7. Linear Discriminant Analysis (LDA)

What Is LDA?

LDA is a supervised dimensionality reduction technique that maximizes class separability.

Note: Often taught alongside unsupervised methods for comparison.


How LDA Works

  • Maximizes between-class variance
  • Minimizes within-class variance

Example

Face recognition:

  • Project faces into space where classes are separable

Difference Between PCA and LDA

PCALDA
UnsupervisedSupervised
Maximizes varianceMaximizes class separation
Ignores labelsUses labels

8. t-SNE (t-Distributed Stochastic Neighbor Embedding)

What Is t-SNE?

t-SNE is a non-linear dimensionality reduction technique mainly used for visualization.


Key Idea

  • Preserves local structure
  • Nearby points stay nearby

Example

Visualizing high-dimensional word embeddings in 2D.


Advantages

  • Excellent visualization
  • Reveals hidden clusters

Limitations

  • Computationally expensive
  • Not suitable for training models
  • Results vary with parameters

Use Cases

  • Embedding visualization
  • Data exploration
  • Cluster analysis

Summary Table

AlgorithmTypeStrength
K-MeansClusteringFast, simple
HierarchicalClusteringInterpretability
DBSCANClusteringNoise handling
PCADimensionality reductionSpeed, compression
LDADimensionality reductionClass separation
t-SNEDimensionality reductionVisualization

Leave a Comment

    🚀 Join Common Jobs Pro — Referrals & Profile Visibility Join Now ×
    🔥