Model Visualization and Explainability

Model explainability remains a hurdle towards widespread adoption and understanding of machine learning. In this notebook, we will train and visualize a neural net that predicts credit card defaults based on credit usage and payment history, plus some demographic information. The goal is to explore how we can use VIP to visualize the output of complex machine learning models, and to then explain the results in terms of input features.

In the final portion of the notebook, we also visualize the results of a gridsearch optimization of hyperparameters to determine what combinations of these hyperparameters optimize a gradient boosting machine (GBM).

Import Data and Preprocess

Data from UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005. All amounts are given in Taiwanese dollars.

See Kaggle for further explanation of the dataset: https://www.kaggle.com/uciml/default-of-credit-card-clients-dataset/

Load and Visualize Data in VIP

Notes on Smart Mapping Results

Payment status is very important:

The top 5 most relevant features as determined by smart mapping all have to do with payment status.

Points on the diagonal:

These are clients maxing out credit limit. Potentially higher rate of default among these individuals.

Generate Automatic Insights

Red Insight:

When limit_bal is low, and pay_amt is low, there is higher risk of default. These are people who have little/risky credit history, leading to low limit_bal.

Blue Insight:

When limit_bal is high, there is low risk of default, regardless of payments. This may be because in order to get a high credit limit, one has to either demonstrate fiscal responsibility or be weathly already.

Build and Train Neural Net

Using Keras with a TensorFlow back-end

Preprocessing

Engineer New Features

Takeaway: we can make qualitative statements of what the Neural Net is learning

Possible followup: Drag Education to playback and step through different education levels

Note the distribution of risk in the color legend on the right (it may help to switch to percent view). This gives a breakdown of how the model might take education into account when making predictions. The same can be done for marital status and gender.

Visualize Grid Search of Hyperparameters - GBM

Using a gradient boosting machine, we search for the optimal hyperparameters by maximizing f1 score

Run gridsearch over 3 different hyperparameters

Note how hard it is to find the overall trends in a list of text

Plot gridsearch results in VIP

Takeaways

Drag the slider in the color legend to reveal where the optimal hyperparameters are located