Explainable Models — Unlock the Black Box



We live in an era in which machine learning and artificial intelligence can generate a myriad of predictions. One difficulty, however, is can we explain how the model came to its conclusions? Can we explain this to non-data scientists? Is it scientifically correct? The following are two common definitions related to model interpretability and explainability:

  1. Model interpretability is “the extent to which you are able to predict what is going to happen, given a change in input or algorithmic parameters”
  2. Model explainability is “the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms”

These definitions are not perfect, but they do highlight the challenge of explaining model results. Because many models are difficult to explain they are considered “black boxes.:
In this article, I will discuss model explainability choices using those available in Orange, as well as other programs.

What are the Current Choices for Model Explainability?

Statistical models such as linear and logistic regression, as well as decision trees, are easy to interpret because they provide coefficients, log odds ratios, information gain, etc. Newer machine learning models such as random forest, gradient boosting, and neural networks do not automatically produce results that are easily explainable.

For linear regression, the goal is to predict a numerical y based on predictors x1, x2, etc. A simple linear regression line has the equation y= a + bX, where x is the explanatory (predictor or independent) variable and y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). Below are coefficients derived from linear regression run in Orange on a dataset to predict red wine quality. For every one-unit increase in the x variable, the y variable increases by [the coefficient] when all other variables are held constant. Therefore, for a one unit increase in sulfates (sulphates), wine quality increases by 0.913.

Logistic regression generates odds ratios and the log of odd ratios. In the table below, note the positive number which means as the number of calcified (colored) coronary vessels on fluoroscopy increases so does the incidence of heart disease. “For every one-unit increase in [X variable], the odds that the observation is in (y class) are [coefficient] times as large as the odds that the observation is not in (y class) when all other variables are held constant.” A one unit increase in the major vessels colored will result in a 1.24 increase in the log of the odds ratio of having heart disease

Decision trees are easy to visualize and interpret. Node splitting is the result of an algorithm that maximizes “information gain,” beginning with the root node. The mathematical algorithm focuses on “node purity” or splitting on a variable that leads to the purest splitting or homogeneity. In the image below of a decision tree (heart disease prediction dataset), the root node splits on the thallium stress test. It splits into normal thallium results in the leaf node on the right and reversible defect or fixed defect on the left. The redder the leaf, the more it predicts heart disease. The tree was pruned to only 3 levels to keep the tree smaller for display reasons. Below this image is a screenshot of the Orange rank widget where the ranking of features is calculated, in this case using information gain. For more information about calculating information gain, you are referred to this reference.

Permutation Feature Importance (PFI). This is a technique that is not particularly well known despite being a Python package. It seems logical that if you could manipulate the features that predict an outcome you might determine which features contribute the most to the model. Shuffle the variables in one column to see what effect they have on model performance (classification or regression). The more important predictors will impact the performance more (AUC, F1, etc.). The below screenshot displays the workflow to determine feature importance for the heart disease prediction dataset using neural networks, xgboost, and random forest utilizing Orange.

Below is the result of feature importance based on the AUC when a neural network is run. In other words when features are shuffled (permutated) what features were most valuable in determining the AUC?

The greatest decrease in the AUC occurred when chest pain was shuffled as seen in the table below, indicating that chest pain was the most important variable. Note: different algorithms often select different top model predictors.

A feature is not important if shuffling its values leaves the model error unchanged. For example, if after permutation the RMSE stays at 12; this feature must not be important to the model.

Local Interpretable Model-Agnostic Explanations (LIME). (2016) This technique adds noise to the original data points, then feeds it into the black box model, and then observes the corresponding outputs. It weighs the new data points as a function of their proximity to the original points. A surrogate model such as linear regression is trained on the dataset with variations using those sample weights. Finally, each original data point can then be explained by the newly trained explanation model.

Interestingly, Hughes et al. in 2021 used the LIME technique to identify which ECG segments were used by computer vision to determine the correct ECG diagnosis. This technique is not available in Orange.

Shapley Additive Explanations (SHAP) (2017) is a Python package that helps to determine feature importance in models that are otherwise black boxes. In the screenshot below, using Orange, we are evaluating which features contributed the most to the classification model on the heart disease dataset using a neural network. It produces a SHAP value (x axis) that can be positive or negative. The most important value is included first (major vessels colored) and feature value is coded with the most valuable being red and the least valuable blue. For example, the female gender and a normal thallium test go against the presence of heart disease. SHAP values can be used to explain a large variety of models including linear models, tree-based models, and neural networks, while other techniques only explain limited model types.

Partial Dependence Plot (PDP) is a model explanability technique that is model agnostic. Using math, the PDP computes the marginal effect or contribution of individual feature(s) to the predictive value of the model. Two major shortcomings of this technique are: features should be independent, which is often not the case, and the plot is less accurate when more than two features are evaluated. The PDP screenshot below is from Kaggle.

Saliency Maps are used to explain computer vision generated by convolutional neural networks. The goal is to identify unique features of the image such as color, intensity, orientation, etc. and they create what amounts to a heat map. The map identifies the most significant features of an image, such as a bird sitting in a tree. The following is a saliency map of several animals. Note that the map focuses on the animal and not the background. (Image from Geeks for Geeks)


This article focuses on the most common techniques to determine what features are most important for a model prediction. Some are relatively transparent like coefficients, whereas others require complex computations to unlock the black box. In the future, we can expect more and better techniques to determine feature importance.