Building a model is only half the battle. In a professional environment, you must prove that your model is reliable and optimize it to its peak performance. Evaluation tells you how good your model is, and Optimization makes it better.
How do we know if our model will work in the real world?
As discussed, we split data (e.g., 80/20) to simulate “unseen” data. However, a single split might be lucky or unlucky based on how the data was shuffled.
To get a more robust score, we use Cross-Validation.
For classification tasks (e.g., “Is this tumor cancerous?”), Accuracy is often misleading. If 99% of patients are healthy, a model that predicts “Healthy” for everyone is 99% accurate but 100% useless for the 1% who are sick.
The Confusion Matrix breaks down the predictions:
Using the values from the Confusion Matrix, we calculate:
Many models don’t just output a category; they output a probability (e.g., 0.85 chance of being spam). You have to choose a “threshold” (usually 0.5) to decide the final category.
Algorithms have “knobs” you can turn to change their behavior, called Hyperparameters (e.g., the depth of a Decision Tree or the number of clusters in K-Means). Tuning is the process of finding the best settings.
You provide a list of values for each hyperparameter, and the computer tries every possible combination (like a brute-force attack).
Instead of trying everything, the computer picks random combinations from the grid.