Mathematics for Data Science mainly consists of three pillars:
Let’s explore each one in depth, with clear explanations and real-world examples.
Linear Algebra is the mathematics of vectors, matrices, and linear transformations.
In Data Science, data is usually represented as vectors and matrices, making linear algebra unavoidable.
A scalar is a single numerical value.
Examples:
In Data Science, scalars represent:
A vector is an ordered collection of numbers, usually written in a row or column.
Example:
Student marks = [80, 75, 90, 85]
Each number represents a feature:
In Data Science:
A matrix is a collection of vectors arranged in rows and columns.
Example:
Students Data Matrix:
[80 75 90]
[60 70 85]
[95 88 92]
Rows → individual data points
Columns → features
In real life:
Matrix operations allow data transformation and model computation.
Adding two matrices of the same size.
Use case:
Combining feature updates or error corrections.
Finding differences between datasets or predicted vs actual values.
The most important operation in Data Science.
Why it matters:
Example:
Prediction = Data Matrix × Weight Matrix
Every ML model internally relies on matrix multiplication.
Swapping rows and columns.
Use case:
The determinant is a single value calculated from a square matrix.
What it tells us:
If determinant = 0 → matrix cannot be inverted
Use in Data Science:
The inverse is like the reciprocal of a matrix.
If:
A × A⁻¹ = I (identity matrix)
Use case:
Eigenvectors are special vectors that do not change direction when a transformation is applied.
Eigenvalues tell how much scaling happens.
Why this matters:
Example:
In PCA, eigenvectors define new axes, and eigenvalues define importance of each axis.
A vector space is a set of vectors where:
In Data Science:
Understanding vector spaces helps in:
Probability helps us measure uncertainty.
Since data is noisy and incomplete, probability is essential for predictions.
Probability ranges from 0 to 1.
Basic rules:
Example:
Probability of getting a head or tail = 1
A random variable represents numerical outcomes of random events.
Takes countable values.
Example:
Takes infinite values.
Example:
A probability distribution describes how likely different outcomes are.
Used in:
Used when:
Example:
Used for event frequency in fixed intervals.
Example:
Probability of event A given that event B has already occurred.
Formula:
P(A|B) = P(A and B) / P(B)
Example:
Probability of rain given it is cloudy.
Used heavily in:
Bayes’ theorem updates probability based on new evidence.
Formula:
P(A|B) = [P(B|A) × P(A)] / P(B)
Why it’s powerful:
Example:
Spam detection:
Statistics helps us summarize, analyze, and infer from data.
Describes what the data looks like.
Includes:
Used to make conclusions about a population using a sample.
Example:
Why important:
They describe central tendency.
Measures how far data points spread from the mean.
Square root of variance (more interpretable).
High deviation → more variability
Low deviation → more consistency
Measures asymmetry of data.
Measures tailedness of distribution.
A confidence interval gives a range where the true population parameter lies.
Example:
95% confidence interval for mean salary.
Used to express uncertainty in estimates.
Used to test assumptions.
Common tests:
The p-value measures how likely the observed result occurred by chance.
Used for decision-making in experiments.