Feature extraction involves creating new features from raw data that might not be immediately usable by an algorithm.
2023-12-25 08:30:00) is hard for a model to process. You extract:
This involves mathematically changing the data to meet the assumptions of the model (e.g., making the distribution more “Normal” or scaling numbers).
Not all features are helpful. Some are redundant or just “noise.” Feature selection keeps only the most relevant variables.
| Method Type | Description | Examples |
| Filter Methods | Statistical tests used before training. | Correlation Heatmaps, Chi-Square test. |
| Wrapper Methods | Trains the model on different subsets of features. | Forward Selection, Backward Elimination. |
| Embedded Methods | Feature selection happens during training. | Lasso (L1) Regularization, Random Forest Importance. |
In many real-world problems (Fraud Detection, Rare Disease diagnosis), one class has 99% of the data and the other has 1%.
These are features created based on specific industry knowledge rather than just math. They often provide the “breakthrough” in model accuracy.