Clustering is the process of grouping data points such that points in the same group (cluster) are more similar to each other than to those in other groups.
The most popular clustering algorithm. It partitions data into $K$ number of clusters by minimizing the distance between data points and the cluster “centroid.”
Builds a hierarchy of clusters. It is typically visualized using a Dendrogram (a tree-like diagram).
Unlike K-Means, DBSCAN groups points that are closely packed together and marks points in low-density regions as outliers.
These algorithms reduce the number of variables (features) in a dataset while keeping as much important information as possible. This is used to speed up computation and visualize high-dimensional data.
A linear transformation that finds the “Principal Components”—the directions where the variance in the data is maximal.
While often used for classification, LDA is also a dimensionality reduction technique. Unlike PCA (which is unsupervised), LDA is supervised because it reduces dimensions while maximizing the separation between known classes.
A non-linear technique mainly used for visualization. It maps high-dimensional data into 2D or 3D space, ensuring that similar points stay close together and dissimilar points stay far apart.
| Feature | PCA | t-SNE |
| Type | Linear | Non-Linear |
| Primary Goal | Preserving global structure/variance. | Preserving local structure (clusters). |
| Speed | Very Fast | Slower (computationally intensive). |
| Output | Deterministic (same result every time). | Stochastic (results may vary slightly). |
Unsupervised learning deals with unlabeled data.
The goal is not prediction, but discovering hidden structures, patterns, and relationships inside the data.
Unlike supervised learning, there is no correct answer provided.
The algorithm must figure out the structure on its own.
Clustering is the task of grouping similar data points together such that:
Similarity is usually measured using distance metrics:
An e-commerce company clusters customers based on:
No labels like “Premium” or “Regular” are given.
The algorithm discovers these segments automatically.
K-Means is a partition-based clustering algorithm that divides data into K clusters, where K is predefined.
Clustering students based on:
K = 3 clusters:
Hierarchical clustering builds a tree-like structure (dendrogram) showing how clusters are formed at different levels.
Agglomerative is more common.
Defines how distance between clusters is calculated:
Grouping documents by topic:
The dendrogram shows topic similarity at different levels.
DBSCAN clusters points based on density, not distance.
It groups together points that are closely packed and marks sparse points as noise.
Geographic data clustering:
Dimensionality reduction reduces the number of features while preserving important information.
High-dimensional data causes:
PCA is a linear dimensionality reduction technique that transforms data into orthogonal components capturing maximum variance.
Dataset with 100 features reduced to 10 components while retaining 95% variance.
LDA is a supervised dimensionality reduction technique that maximizes class separability.
Note: Often taught alongside unsupervised methods for comparison.
Face recognition:
| PCA | LDA |
|---|---|
| Unsupervised | Supervised |
| Maximizes variance | Maximizes class separation |
| Ignores labels | Uses labels |
t-SNE is a non-linear dimensionality reduction technique mainly used for visualization.
Visualizing high-dimensional word embeddings in 2D.
| Algorithm | Type | Strength |
|---|---|---|
| K-Means | Clustering | Fast, simple |
| Hierarchical | Clustering | Interpretability |
| DBSCAN | Clustering | Noise handling |
| PCA | Dimensionality reduction | Speed, compression |
| LDA | Dimensionality reduction | Class separation |
| t-SNE | Dimensionality reduction | Visualization |