SHapley Additive exPlanations or SHAP : What is it ?

9 Mar 2023

min read

Data Science

mikesuperman

SHapley Additive exPlanations, more commonly known as SHAP, is used to explain the output of Machine Learning models. It is based on Shapley values, which use game theory to assign credit for a model’s prediction to each feature or feature value.

The way SHAP works is to decompose the output of a model by the sums of the impact of each feature. SHAP calculates a value that represents the contribution of each feature to the model outcome. These values can be used to understand the importance of each feature and to explain the result of the model to a human. This is especially valuable for agencies and teams that report to their clients or managers.

SHAP has several interesting properties, such as its neutrality towards models. This allows it to be used on any learning model, to produce consistent explanations, and to handle complex model behaviors (when features interact with each other, for example).

WHat SHAP is it used for ?

SHAP has many uses for data science professionals. First, it helps explain the predictions of Machine Learning models in a way that humans can understand. By assigning a value to each input feature, it shows how and to what extent each feature contributed to the final prediction result. This way, the team can understand how the model made its decision and can identify the most important features.

As explained earlier, this model is called agnostic (neutral). It can be used with any Machine Learning model. So you don’t have to worry about the structure of the model to understand the prediction result with SHAP. Moreover, this model is consistent. You can therefore trust the explanations produced, regardless of the model studied.

Finally, SHAP is particularly useful for handling complex behaviors. You can use this technique to understand how different features affect the model prediction together.

How to use SHAP to explain predictions?

Here is how to use SHAP to explain the predictions of a Machine Learning model:

Install the SHAP package using ‘pip install shap’.
Import the SHAP package and other necessary libraries, such as Numpy and Matplotlib.
Load your Machine Learning model and prepare the input data you want to explain.
Create a SHAP object using the ‘shap.TreeExplainer‘ function for tree-based models, or ‘shap.KernelExplainer‘ for other model types.
Call the ‘explain‘ method of the SHAP object by passing it the input data you want to explain. This method will return a matrix of SHAP values that represents the impact of each feature on the model prediction.
Use the SHAP values to visualize and interpret the results. For example, you can use the ‘shap.summary_plot‘ function to generate a summary graph that shows the relative importance of each feature. You can also use the ‘shap.dependence_plot‘ function to visualize how a particular feature influences the prediction of the model as a function of the value of that feature.

This technique is simple and very efficient.

Example of SHAP use

You will find below an example of SHAP use, based on decision trees. To better understand the example, let’s talk about TreeExplainer.

TreeExplainer uses an approach based on the approximation of the set of trees to calculate the SHAP values of each characteristic. It is therefore useful for explaining predictions of Machine Learning models using decision trees. It is also useful for explaining regression and classification models, Random Forests, and Gradient Boosting Machines.

Here is a simple example of using SHAP with a regression model based on decision trees:

				
					import shap
import numpy as np
import matplotlib.pyplot as plt

# Load your model
model = load_model()

# Prepare entry-data you want to explain 
X = prepare_data()

# Create a SHAP object by using TreeExplainer 
explainer = shap.TreeExplainer(model)

# Call the explain method by using entry data
shap_values = explainer.explain(X)

# Display a summary graph of the relative importance of each characteristic
shap.summary_plot(shap_values, X)

# Display the dependency diagram of the characteristic "age”
shap.dependence_plot(« age », shap_values, X)

This example calculates the SHAP values for each X element. It then displays a summary graph of the relative importance of each feature and a dependency plot for the age category.

In conclusion

SHAP is thus a technique that allows to explain the predictions of Machine Learning models in a versatile and powerful way. This method is agnostic, consistent, and can handle complex model behavior. SHAP is particularly useful for understanding how a model works, identifying important features, and explaining the result of predictions to others on your team or to your customers.

Now that you’ve discovered SHAP, you may want to master it. To do so, we invite you to learn more about DataScientest training courses that incorporate Machine Learning into the curriculum.

DataScientest News

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI

Overview

Bildungsgutschein

For Employees