5 min

Show your working: three forms of explainability that actually help

A model that's right for the wrong reasons will eventually be wrong for reasons nobody can find.

That single sentence is the case for explainability. It is also, unhelpfully, the case for a dozen different techniques that the industry lumps together under one word: SHAP, LIME, attention maps, feature importance, partial dependence, counterfactuals, model cards, and audit logs. All of these get called explainability. All of them answer different questions. And in most production ML projects, the one you actually need is not the one that got built.

Three different things called explainability

Here's the cut that matters. Explainability is at least three different things, each solving a different problem.

The first is prediction-level explanation. The question it answers: why did this specific prediction come out this way? A customer's loan application is denied by an automated system. Regulators require the lender to tell the customer why. You cannot ship a loan denial that says 'the model said no'. You need a per-prediction explanation: 'this application was denied primarily because the stated income was inconsistent with three years of reported tax returns, and secondarily because the requested loan-to-value ratio exceeded our threshold for unsecured personal loans'. SHAP values are the current best answer for this. For any single prediction, SHAP decomposes the output into per-feature contributions relative to a baseline. Most gradient-boosted tree libraries compute SHAP values natively. Use them.

What SHAP does not do: explain the model as a whole. SHAP tells you why a specific prediction came out this way; it does not tell you whether the model itself is reasonable. That's a different question with a different tool.

The second is global feature importance. The question it answers: across all predictions this model makes, which features drive it most? This is the check you run before the model ships. If your credit-risk model's most important feature is 'day of week', something is wrong: probably leakage in your training data. If the second-most-important feature is the customer's postcode, you may have a fairness problem worth addressing. Global feature importance is the sanity check that the model is modelling what you think it is modelling. Every reasonable training pipeline produces it; almost no production pipeline looks at it after the first week.

Neither of these, however, answers the question that matters most in a regulated context: can we reconstruct what happened six months ago?

The audit trail

The third form of explainability is not usually called explainability at all. It's the audit trail, and it's almost always more important than the other two put together. If a customer disputes an automated decision in month seven, you need to be able to answer four questions at once: what data went into the model, what version of the model processed it, what came out, and when? Most ML pipelines can answer maybe one of these. The others disappear into rolling logs that were never persisted, into model files overwritten by the next training run, into features derived on the fly and never stored.

Audit trails are a software engineering problem, not a machine learning problem. They require you to log every input record that is scored with a stable ID, to version every model that goes into production with a hash or a tag, to log every prediction with a pointer back to the input and the model version, and to retain all of this for at least as long as your regulatory obligations run, usually seven years in Australian financial services.

It is astonishing how often this third pillar is missing. The shiny parts of modern ML (transformers, foundation models, vector databases) tend to attract engineering attention. The unshiny parts (logging schemas, retention policies, version tags) tend not to. Both fail silently. Only one of them fails catastrophically.

One question to ask at the start of any ML project, for each of these three: is there a plan in writing? Not 'we will figure it out later', but an actual plan. Who owns the SHAP outputs, and who reads them? Who checks global feature importance before a model ships? What gets logged for audit, where, and for how long?

Explainability is not a feature you bolt on at the end. It's a decision made on day one, about what you log, what you store, what you compute when, and it's most of what separates models you can defend from models you can only hope.

// The artefact

# audit/log.py: every prediction must be reconstructible six months later
def log_prediction(record_id: str, features: dict, model_version: str, output: float) -> None:
    feature_hash = sha256(json.dumps(features, sort_keys=True).encode()).hexdigest()
    audit.write(
        ts=datetime.utcnow(), record=record_id, model=model_version,
        feature_hash=feature_hash, output=output,
    )

The unshiny third pillar. Without this row, you can't answer the regulator in month seven.

← Back to Insights