Evaluating a model isn't just about seeing if it works, but understanding how and where it fails. Once the model generates predictions on the test set, we use performance metrics to quantify its accuracy and generalization capability.
Regression: Measuring Continuous Error
In regression problems, we aim to predict a numerical value. Metrics here measure the distance between the actual value and the prediction.
| Metric | Definition |
|---|---|
| MSE (Mean Squared Error) | Average of the squared errors. |
| RMSE (Root Mean Squared Error) | The square root of the MSE. |
| MAE (Mean Absolute Error) | Average of the absolute differences (without squaring). |
| R² (Coefficient of Determination) | Indicates the percentage of data variation explained by the model. |
Regression metrics.
For MSE, RMSE, and MAE, we want them to be as small as possible, ideally 0. These metrics measure the distance between the model's prediction and the actual value.
As for R², we want it to be as large as possible, as close to 1 as can be. It tells us what percentage of the data the model explains.
Classification: Measuring Decision Quality
In classification, we don't measure distances, but rather the frequency of successes and errors depending on the category.
| Metric | Key Question |
|---|---|
| Accuracy | How often is it correct overall? |
| Precision | How reliable is it when it predicts "Positive"? |
| Recall | How good is it at finding all the "Positives"? |
| F1-Score | How well does it balance Precision and Recall? |
| ROC AUC | How skilled is it at differentiating one class from another? |
Classification metrics.
Always evaluate your model using multiple metrics simultaneously. The best model isn't necessarily the one with the highest number, but the one that best solves the specific problem for which it was designed.