Understanding How and Why Your Model Works 



In many businesses today, the emphasis is put on the predictions generated by data science work — not on understanding how those predictions are actually made. What song will this user enjoy? Will an applicant pay back a loan? Will a user click on an ad? These questions are answered by predictive models that most business users know very little about. 

Actually understanding how a predictive model makes a prediction is an often overlooked part of the data science workflow. The practice of model interpretation, unlike validation, is not focused on how accurate your models are; instead, it's about understanding why your models are as accurate as they are and how they work. That information can go a long way in enabling generalizability, insight, and fairness.  Let's begin by exploring generalizability with a bit of a parable.

How Your Model Learns The Wrong Thing

Healthcare programs are often judged, in part, on readmission rates  the proportion of inpatients with a given condition that return to the hospital within some interval of time for treatment of that same condition.

Several academic studies (and a machine learning competition) have sought to predict whether patients would need to be readmitted based on various lab tests and their demographics, in order to help improve outcomes.

Let's say we are trying to predict readmission of diabetes patients to two hospitals based on these features:

  Feature Description
0 visit_id ID for the visit
1 patient_id ID for the patient
2 admission_type_id Nature of admission (emergency, elective, etc.)
3 number_inpatient Number of previous impatient visits
4 A1Cresult Results for test measuring blood sugar
5 glucose_serum Glucose serum levels
6 gender Gender of patient
7 age Age of patient
8 medications_prescribed Medications prescribed during visit (of 24 types)

Fitting a random forest (a very popular machine learning model) to the data, let's say we find that our model performs well both on the original training data and on a validation data set. Though our validation accuracy is great, we'd have serious problems if we deployed this model to other hospitals around the country.

Looking at feature importance, a type of model interpretation algorithm that computes the degree to which a model uses a feature to make predictions, we'd see that visit_id and patient_id are the most important features to the model. But as it turns out, patients with particularly bad prognoses were sent to one hospital and patients with good prognoses were sent to the other. Each hospital used a different range of ID numbers, so our model just memorized these IDs. Clearly, what the model learned would not translate to a new hospital setting.

As your business increasingly puts predictive models into production to inform decisions, those models will get more and more complex and the distance between the model consumers (in this case, hospital decision makers) and model builders (data scientists) grows. We end up needing a framework to explain how those machines learning models are making those predictions. These frameworks need to provide intuitive, relevant explanations of any predictive model, regardless of the type of learning algorithm or how it was implemented.

How Model Interpretation Helps Assess Generalizability

In our diabetes readmission example, we realized that our model was failing to generalize to new settings because it learned the wrong features. This is an instance of data leakage, whereby information that was available in the context of model training was unavailable in the deployment setting.

Determining whether leakage is occurring is cognitively expensive; the researcher must extrapolate whether a given feature will be available in other settings. When models consume hundreds of features, it's impractical for a data scientist or researcher to consider each and every one. This prevents us from preconceiving points of failure in a deployment setting. But with algorithms like feature importance, which describes the magnitude of dependence a model has on a feature, the data scientist can focus his or her attention on the most important features and better forecast model behavior.

In production settings where populations are dynamic (for instance, e-commerce marketplaces), we need to make inferences about model decisions for observations that are different than those observed during training. In other words, does a model extrapolate reasonably to new data?

Sometimes, models yield extreme values for new regions in the domain, as is often the case with real-valued parametric models and neural networks. Meanwhile, other algorithms like random forests can be completely flat (kernel-based models often behave more predictably in new regions). While the production-extrapolation problem is better solved with ongoing model evaluation and online learning algorithms, interpretation algorithms that identify important features can help us spot extrapolation issues before they occur.

Related to this extrapolation challenge is the problem of leverage points. A model's decision criteria can be adversely affected by observations with extreme target values. In particular, high variance models that are not robust to outliers, like a neural network without sufficient regularization and some random forest implementations, can be sensitive to leverage points and learn to predict extreme values in certain settings. Interpretation algorithms that allow us to visualize predictions across regions of the input space can help us identify extreme behavior.

So that they can forecast model behavior in production settings, data scientists often use simpler, more inherently interpretable models in the place of potentially more accurate models that can represent complex hypotheses. But we don't have to continue making these trade-offs: It's possible to develop a toolbox that shows the mechanics of any model and identifies boundaries between training data and regions of extrapolation.

Assessing Fairness, Accountability, And Compliance

How should we judge predictive models? Models that are going be to put into production systems are usually evaluated based on performance. But in some settings, we also want to ensure that our models are making decisions according to a policy.

In the credit industry, firms commonly use predictive models to augment their underwriting process. Recently, the industry has begun using deep learning models as its tool of choice. This presents a problem, as deep learning models are notoriously difficult to inspect.

Similarly, in the criminal justice world, courts frequently use predictive models to inform sentencing decisions. In cases like these, models are used to help make high-stakes decisions which may be subject to regulatory or ethical requirements. For instance, the Equal Credit Opportunity Act prohibits credit discrimination on the basis of race, color, religion, national origin, sex, marital status, age, or reliance on public assistance. To prevent ethical or legal violations, both model builders and model consumers need to be able to evaluate whether and how a feature like race or gender impacts model predictions.

Model Interpretation For Generating Insights And Policymaking

One of the reasons deep learning models are often successful is that they can represent complex hypotheses; some contain millions of parameters. Simpler models may not be able to represent the true relationship between features and a target. In these cases, deep learning can help provide the most accurate predictions, but it don't give us insights into the underlying relationships.

In settings where complex models best capture the true relationships between X and Y, we may want to uncover those relationships for policymaking (discerning causal relationships is an entirely separate task from describing model behavior. Our most predictive model may not be the model that best encapsulates causal relationships). For instance, some firms track the collection attributes of leads in their sales funnel and use machine learning to estimate the probability of a sale, conversion, or other transaction to aid sales or customer service, or to affect some component of CRM automation. Meanwhile, the marketing team may be charged with identifying subpopulations that are likely to convert. Interpretation algorithms can help unlock the insights embedded in predictive models, exposing them to stakeholders and allowing others to exploit predictive models for non-predictive tasks.

Exposing key patterns embedded in a model's decision criteria is particularly helpful when models are consumed by organizations or teams other than model builders (analytics service companies, machine learning API marketplaces, siloed data science teams). In these settings, model interpretation algorithms can engender trust between the model builder and the model consumer, and add value to the model-development process by uncovering insights.

Why Interpretation Algorithms Should Be Model Agnostic

Generally, methods for interpreting models are model specific. Feature importance is computed by looking at the effects of splitting on a feature within a decision tree, while regressions are explained by their coefficient values.

The drawback of this practice is that when we require a particular type of interpretation, we are limited in our choice of models that provide it, potentially at the expense of using a more predictive and representative model. Furthermore, as part of the model selection process, we are unable to compare the behavior of different types of models with a common framework, essentially constraining our model selection criteria to validation metrics. And when a model consumer does not have access to the source code or environment used to build a model, he or she has to make inferences about the decision criteria without anything other than new predictions.

Therefore, there has been a recent surge in interest in model-agnostic interpretation algorithms  algorithms that describe any predictive model with respect to the features it takes as inputs. Generally, this approach leverages data perturbation to learn about model decision criteria: It allows you to strategically feed inputs into a black box model, obtain predictions, and model relationships between the two. Here at DataScience we are actively researching model agnostic interpretation to develop resources for our clients and the open source community.