Regression is used when the target variable is continuous (e.g., predicting price, temperature, or height).
The simplest form of regression that models the relationship between a dependent variable $y$ and one independent variable $x$ using a straight line.
An extension of linear regression that uses two or more independent variables to predict the outcome.
Used when the relationship between data points is non-linear. It fits a non-linear model to the data by squaring or cubing the independent variables.
Regularization prevents overfitting (where a model performs perfectly on training data but poorly on new data) by adding a penalty term to the cost function.
| Feature | Ridge Regression (L2) | Lasso Regression (L1) |
| Penalty | Adds “squared magnitude” of coefficients. | Adds “absolute value magnitude” of coefficients. |
| Effect | Shrinks coefficients toward zero, but never exactly zero. | Can shrink coefficients to exactly zero. |
| Best Use | When most variables are useful. | For feature selection (removes useless variables). |
Classification is used when the target variable is categorical (e.g., Yes/No, Spam/Not Spam).
Despite its name, it’s a classification algorithm. It uses the Sigmoid function to output a probability between 0 and 1.
A “lazy learner” that classifies a data point based on how its neighbors are classified. If $K=3$, the model looks at the 3 closest points and takes a majority vote.
Based on Bayes’ Theorem, it assumes that all features are independent of each other (which is “naive” but often works well).
Finds the best “hyperplane” (boundary) that maximizes the margin between two classes.
A flowchart-like structure where each internal node represents a “test” on an attribute, and each leaf node represents a class label.
An “Ensemble” method that builds multiple Decision Trees and merges them together (usually by voting) to get a more accurate and stable prediction.
These models build trees sequentially. Each new tree attempts to correct the errors made by the previous trees.