Model Evaluation & Optimization

1. Regression Metrics

These measure the distance between the predicted value ($\hat{y}$) and the actual value ($y$).

MAE (Mean Absolute Error): The average of the absolute differences. It’s easy to understand because it’s in the same units as your data.
- Example: An MAE of 5 in house prices means your predictions are off by $5,000 on average.
MSE (Mean Squared Error): The average of the squared differences. Because it squares the error, it punishes large outliers more heavily than MAE.
RMSE (Root Mean Squared Error): The square root of MSE. It brings the unit back to the original scale while still penalizing large errors.
- Formula: $RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2}$
$R^2$ (Coefficient of Determination): Measures how much of the variance in the data is explained by the model.
- 1.0 = Perfect fit.
- 0.0 = The model is no better than just guessing the average value.

2. Classification Metrics

For classification, “Accuracy” can be misleading, especially if your data is imbalanced (e.g., 99% of transactions are legitimate and only 1% are fraud).

Confusion Matrix

A table used to describe the performance of a classification model.

True Positive (TP): Predicted “Yes”, Actual “Yes”.
True Negative (TN): Predicted “No”, Actual “No”.
False Positive (FP): Predicted “Yes”, Actual “No” (Type I Error).
False Negative (FN): Predicted “No”, Actual “Yes” (Type II Error).

Core Metrics

Accuracy: $\frac{TP + TN}{Total}$. Overall correctness.
Precision: $\frac{TP}{TP + FP}$. “Of all predicted positives, how many were actually positive?” (Important for Spam filters).
Recall (Sensitivity): $\frac{TP}{TP + FN}$. “Of all actual positives, how many did we catch?” (Important for Cancer detection).
F1-Score: The harmonic mean of Precision and Recall. Use this when you want a balance between the two.

ROC-AUC

ROC Curve: A plot of the True Positive Rate vs. the False Positive Rate at various thresholds.
AUC (Area Under the Curve): A single number representing the model’s ability to distinguish between classes.
- 0.5 = Random guessing.
- 1.0 = Perfect classifier.

3. Hyperparameter Tuning

Hyperparameters are the “settings” of an algorithm (like the depth of a tree or the $K$ in KNN) that you set before training.

Grid Search: You define a list of values for each hyperparameter, and the computer tries every single combination. It is thorough but very slow.
Random Search: The computer picks random combinations of hyperparameters from a range. It is often much faster and usually finds a result nearly as good as Grid Search.

4. Model Selection Strategies

Cross-Validation (K-Fold): Split your data into $K$ parts. Train on $K-1$ parts and test on the remaining part. Repeat $K$ times so every piece of data is used for testing once. This ensures your model isn’t just “lucky” on one specific split.
Train/Test Split: The standard practice of keeping 20% of your data hidden from the model during training to see how it handles “real-world” unseen data.
Occam’s Razor: If two models have similar performance, always choose the simpler one. It is less likely to overfit.

Log In

Sign Up