Machine Learning (ML) is the core engine behind modern AI systems.
Instead of writing rules manually, ML allows systems to learn patterns from data and improve their performance over time.
This module explains how learning happens, different types of ML, and common challenges faced while building models.
In Machine Learning, learning means improving performance on a task using experience (data).
A program is said to learn from experience E, with respect to a task T and a performance measure P, if its performance on T, as measured by P, improves with experience E.
Task: Predict house prices
Experience: Past house sale data
Performance: Prediction accuracy
As the model sees more data, it becomes better at predicting prices — this is learning.
Machine Learning is classified based on how data is provided and how feedback is given.
Supervised learning uses labeled data, meaning each input comes with a correct output.
| Input (Features) | Output (Label) |
|---|---|
| House size, location | Price |
| Email text | Spam / Not Spam |
1. Classification
2. Regression
Email spam filter:
Unsupervised learning works with unlabeled data.
1. Clustering
Example:
Customer segmentation based on buying behavior
2. Dimensionality Reduction
Example:
PCA for data visualization
E-commerce platform grouping customers into segments without knowing categories beforehand.
Semi-supervised learning uses both labeled and unlabeled data.
Image classification:
Used in:
Reinforcement learning (RL) is based on trial and error.
Example:
A robot learning to walk by trial and error.
This concept explains why models fail to generalize.
Bias is error due to overly simplistic assumptions.
High bias leads to:
Example:
Using a straight line to model curved data.
Variance is error due to sensitivity to training data.
High variance leads to:
Example:
Model memorizes training data.
Goal:
Find balance that minimizes total error.
Occurs when:
Symptoms:
Example:
Linear model for complex data
Occurs when:
Symptoms:
Example:
Decision tree with unlimited depth
Train-test split evaluates model generalization ability.
To test model on unseen data.
Important Rule:
Test data must not influence training.
Cross-validation provides a more reliable performance estimate.
5-fold cross-validation:
Without understanding these basics:
With strong fundamentals: