Supervised Learning Algorithms

1. Regression

Regression is used when the target variable is continuous (e.g., predicting price, temperature, or height).

Linear Regression

The simplest form of regression that models the relationship between a dependent variable $y$ and one independent variable $x$ using a straight line.

Equation: $y = \beta_0 + \beta_1x + \epsilon$
Example: Predicting a house price based solely on its square footage.

Multiple Linear Regression

An extension of linear regression that uses two or more independent variables to predict the outcome.

Equation: $y = \beta_0 + \beta_1x_1 + \beta_2x_2 + … + \beta_nx_n + \epsilon$
Example: Predicting house price based on square footage, number of bedrooms, and age of the house.

Polynomial Regression

Used when the relationship between data points is non-linear. It fits a non-linear model to the data by squaring or cubing the independent variables.

Equation: $y = \beta_0 + \beta_1x + \beta_2x^2 + … + \beta_nx^n + \epsilon$
Example: Predicting the growth rate of a biological population over time, which often follows a curve rather than a line.

2. Regularization (L1 & L2)

Regularization prevents overfitting (where a model performs perfectly on training data but poorly on new data) by adding a penalty term to the cost function.

Feature	Ridge Regression (L2)	Lasso Regression (L1)
Penalty	Adds “squared magnitude” of coefficients.	Adds “absolute value magnitude” of coefficients.
Effect	Shrinks coefficients toward zero, but never exactly zero.	Can shrink coefficients to exactly zero.
Best Use	When most variables are useful.	For feature selection (removes useless variables).

3. Classification

Classification is used when the target variable is categorical (e.g., Yes/No, Spam/Not Spam).

Logistic Regression

Despite its name, it’s a classification algorithm. It uses the Sigmoid function to output a probability between 0 and 1.

Example: Predicting if an email is “Spam” (1) or “Not Spam” (0).

K-Nearest Neighbors (KNN)

A “lazy learner” that classifies a data point based on how its neighbors are classified. If $K=3$, the model looks at the 3 closest points and takes a majority vote.

Example: Recommending a movie to a user based on the preferences of 5 similar users.

Naive Bayes

Based on Bayes’ Theorem, it assumes that all features are independent of each other (which is “naive” but often works well).

Example: Sentiment analysis—deciding if a movie review is positive or negative based on the words used.

Support Vector Machines (SVM)

Finds the best “hyperplane” (boundary) that maximizes the margin between two classes.

Example: Distinguishing between images of cats and dogs by finding the most distinct boundary between their features.

4. Tree-Based & Ensemble Models

Decision Trees

A flowchart-like structure where each internal node represents a “test” on an attribute, and each leaf node represents a class label.

Example: A bank deciding to “Approve” or “Reject” a loan based on credit score, income, and employment status.

Random Forest

An “Ensemble” method that builds multiple Decision Trees and merges them together (usually by voting) to get a more accurate and stable prediction.

Example: Predicting whether a patient has a specific disease by combining the “opinions” of 100 different decision trees.

Gradient Boosting (XGBoost, LightGBM)

These models build trees sequentially. Each new tree attempts to correct the errors made by the previous trees.

XGBoost: Highly efficient, handles missing data well, and is a favorite in Kaggle competitions.
LightGBM: Optimized for speed and handles large datasets by grouping features.
Example: High-frequency stock market prediction where speed and precision are critical.

Log In

Sign Up