Why is it important to choose the right metrics to evaluate a Machine Learning model?

It is crucial to choose the right metrics because they determine how the model's performance is evaluated. An inappropriate metric can give a misleading picture of the model's quality.

What are the main metrics used in Machine Learning?

The main metrics used in Machine Learning include: - Accuracy: The percentage of correct predictions out of the total predictions. - Precision: The ratio of true positive predictions to the total positive predictions. - Recall: The ratio of true positive predictions to the total actual positive elements.

How to choose between precision and recall to evaluate a model?

The choice between precision, recall, and the F1-Score depends on the context and priorities of the model's application: - Precision: Used when the cost of false positives is high. For example, in spam filtering, where it is important to minimize legitimate emails being classified as spam. - Recall: Preferred when the cost of false negatives is high. For example, in medical screening, where it is crucial to identify all positive cases, even if it means more false positives.

Back to articles

Machine Learning Metrics: How to Evaluate a Model?

13 Aug 2024

m de lecture

Data Science, Machine Learning

Daniel

After developing a Machine Learning model, evaluating its performance is crucial to assessing its effectiveness. To objectively compare multiple models, the use of metrics is indispensable. Understanding and knowing how to use these metrics is essential for developing an effective Machine Learning model. In this article, you will discover the main metrics used to evaluate a Machine Learning model.

What is a metric in Machine Learning?

Machine Learning allows computers to learn and make predictions or decisions based on data.

There are two types of learning: supervised learning and unsupervised learning.

In this article, we will focus on a supervised framework. For more details on the basics of Machine Learning and the difference between these two types of learning, we recommend reading this article, which introduces key Machine Learning concepts useful for understanding the use of metrics in a Machine Learning model.

A metric is a numerical value that quantifies the quality of a model’s predictions. Its role is essential during all stages of the development of a Machine Learning model as it helps determine if a model meets our expectations. Based on the results obtained, metrics allow for the objective comparison of several models with each other, the selection of the most efficient model, or changes to a model’s hyperparameters.

A good grasp of different metrics is essential for deploying an effective model.

Which metric to choose for my model?

To choose the appropriate metric, it is important to understand the context of the problem and the model’s objectives. There are many metrics, and we will present a few of them and their advantages.

In a supervised learning framework, one must first determine the type of prediction the model must make. If the model must predict a numerical value (e.g., the price of a house), it is a regression problem (e.g., linear regression). Conversely, if the model must predict a categorical value (e.g., the presence or absence of fraud in a banking transaction), we are in a classification context. The metrics used in regression models are indeed different from those used in classification models.

A. Regression Metrics

In this article, we will present two of the primary regression metrics: Mean Squared Error (MSE) and Mean Absolute Error (MAE).

Mean Squared Error (MSE) is defined as follows:

\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\overset{\land}{y}}_{i})}^{2}

where N is the number of observations, yi is the actual value, and ŷi is the predicted value.

This metric sums the squares of the differences between actual values and predictions. Mean Squared Error heavily penalizes large differences between the actual value and the prediction, which can be useful in contexts where such differences are particularly undesirable.

You can get more information on Mean Squared Error by reading this article, which details its characteristics and an example of its application.

Mean Absolute Error (MAE) is defined as follows:

\frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\overset{\land}{y}}_{i} |

where N is the number of observations, yi is the actual value, and ŷi is the predicted value.

This metric sums the absolute values of the differences between predictions and actual values.

Mean Absolute Error is less sensitive to large differences than Mean Squared Error.

B. Classification Metrics

In a classification framework, the way to evaluate a model’s performance is different. We will present three primary classification metrics: accuracy, precision, and recall.

To calculate accuracy, simply evaluate the rate of correct predictions relative to the total number of predictions:

\frac{Number of correct predictions}{Total number of predictions}

This formula returns a number between 0 and 1. A score close to 1 indicates a very good model, whereas a score close to 0 indicates a poor model. This metric is quite intuitive and easy to understand. However, it poorly evaluates the performance of a model based on imbalanced data, or data where prediction errors do not have the same impact.

To illustrate this concept, let’s take the example of a model that detects the presence of a disease in a patient. If in 90% of the cases, the patient is not sick, the model could systematically predict that the patient is healthy. The accuracy of this model would then be 0.9, which seems to be a very good score. However, two major problems can arise:

- First, the model would be incapable of detecting the disease in a patient.
- Secondly, the quality of predictions is not taken into account. Predicting that a patient is sick when they are not (known as a false positive) will not have the same impact as predicting that a patient is not sick when they are (false negative).

Accuracy does not allow for nuances between different predictions and does not account for imbalanced data. This is why there are metrics to address this problem:

Precision is defined as follows:

\frac{TP}{TP + FP}

where TP represents the number of True Positives and FP represents the number of False Positives.

This metric is useful when the cost of false positives is high.

Similarly, recall is defined as follows:

\frac{TP}{TP + FN}

where TP represents the number of True Positives and FN represents the number of False Negatives.

This metric is useful when the cost of false negatives is high.

Conclusion

In conclusion, using metrics is essential to evaluate the performance of a Machine Learning model. Choosing the right metric for the model allows for making the correct decisions regarding how to improve it. Depending on the type of model (classification or regression model), context, and data type, certain metrics will be preferable to others. It is important to understand the advantages and disadvantages of each metric to use the one that best fits your problem.

Data Scientists use metrics to prepare effective Machine Learning models. For this, they use a multitude of mathematical concepts and specific software capable of preparing and analyzing data. As such, training is more than necessary. This is precisely possible with DataScientest. We offer comprehensive training programs in bootcamps, continuous, or alternating formats.

DataScientest News

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI