> Work that e.g. tries to approximate neural nets with decision trees have yet to be applied to very large models
Well, that and a decision tree isn't any more interpretable if its nodes are all filled with `Pixel[i,j] > Threshold` -- or is the idea that you would somehow extract logical predicates by tracing paths through the tree or glean interpretation from those predicates?
I agree that large decision trees are also hard to interpret. The idea in the above paper is that we regularize with respect to average tree path length, based on the intuition that smaller trees are more interpretable. (But the issue about pixel thresholds probably still stands.)
Well, that and a decision tree isn't any more interpretable if its nodes are all filled with `Pixel[i,j] > Threshold` -- or is the idea that you would somehow extract logical predicates by tracing paths through the tree or glean interpretation from those predicates?