My understanding is that interpretable models, especially for neural networks, are very far and away from state of the art performance. Work that e.g. tries to approximate neural nets with decision trees have yet to be applied to very large models [1].
Even in computer vision, which is where I think they've been most successful, the visualization techniques used seem more suggestive, then explanatory [2].
> Work that e.g. tries to approximate neural nets with decision trees have yet to be applied to very large models
Well, that and a decision tree isn't any more interpretable if its nodes are all filled with `Pixel[i,j] > Threshold` -- or is the idea that you would somehow extract logical predicates by tracing paths through the tree or glean interpretation from those predicates?
I agree that large decision trees are also hard to interpret. The idea in the above paper is that we regularize with respect to average tree path length, based on the intuition that smaller trees are more interpretable. (But the issue about pixel thresholds probably still stands.)
I think it's a real open question whether the interpretable models are actually worse, or merely worse in competition/benchmark problem sets. The more deep models I build, the more I'm convinced that behind every inscrutable parameter hides a certain amount of overfitting, with maybe a few notable exceptions. E.g., can you build a decision tree that's not obviously overfit but is also susceptible to adversarial perturbations?
It’s also a real open question whether any of the interpretable models are actually interpretable, or even if they are in any well-defined sense more interpretable than “black box” alternatives.
In practice the answer is a massive “no” so far. Some of the least interpretable models I’ve had the misfortune to deal with in practice are misspecified linear regression models, especially when non-linearities in the true covariate relationships causes linear models to give wildly misleading statistical significance outputs and classical model fitting leads to estimating coefficients of the wrong sign.
Real interpretability is not a property of the mechanism of the model, but rather consistent understanding of the data generating process.
Unfortunately, people like to conflate the mechanism of the model for some notion of “explainability” because it’s politically convenient and susceptible to arguments from authority (if you control the subjective standards of “explainability”).
If your model does not adequately predict the data generating process, then your model absolutely does not explain it or articulate its inner working.
> If your model does not adequately predict the data generating process, then your model absolutely does not explain it or articulate its inner working.
That's a very dynamicist viewpoint. I don't necessarily disagree.
However, in what sense to the prototypical deep learning models predict the data generating process?
I tend to agree that a lot of work with "interpretable" in the title is horseshit and misses the forest for the trees.
I agree that there's probably a good question about to what extent looking at benchmark sets is biasing our judgment.
But it's also unclear to me how to get a decision tree to perform as well on image recognition tasks the same way that a CNN does. (Of course, as you mention, the CNN will likely face adversarial examples.)
The distinction between ML and programming is mostly propaganda, in the sense that it's not flat out wrong but is mostly used to win money/power. It's not actually a helpful way of understanding... anything.
Even in computer vision, which is where I think they've been most successful, the visualization techniques used seem more suggestive, then explanatory [2].
[1] http://www.shallowmind.co/jekyll/pixyll/2017/12/30/tree-regu... [2] https://distill.pub/2019/activation-atlas/