Thanks for the comment! As far as seeing what the core parts are, I'm really partial to the method where you have a graph network with a really high L2 communication penalty which seems to approach the intrinsic dimension of the problem for certain simple problems. How well that scales to larger problems I have no idea (probably not well, but techniques like variational dropout are pretty analogous in their own kinds of ways). Thankfully with a fast-training network you can play/dork around with the numbers a bit and see what's what.
One could distill the tiny ResNet into a graph network, which with the right constraints could theoretically accomplish the same as the original neural network, and then compress that as small as possible. There's probably an interesting tradeoff in "maximal compression" and "number of iterate rounds" for said graph network. I recently got enough runs (25) and performance difference to have a p=.0014 result or something like that in half an hour for something I was experimenting with against the baseline recently and it felt so good because I wasn't diddling with 5 runs, which in some cases for certain papers take days to finish. It's just a very satisfying feeling.
I guess back to the explainability side of things.... -- that alone I don't think would necessarily provide answers to the explainability problem but I think it would be like a refining-oil type step before diving into the L2-compressed feature representations....
One could distill the tiny ResNet into a graph network, which with the right constraints could theoretically accomplish the same as the original neural network, and then compress that as small as possible. There's probably an interesting tradeoff in "maximal compression" and "number of iterate rounds" for said graph network. I recently got enough runs (25) and performance difference to have a p=.0014 result or something like that in half an hour for something I was experimenting with against the baseline recently and it felt so good because I wasn't diddling with 5 runs, which in some cases for certain papers take days to finish. It's just a very satisfying feeling.
I guess back to the explainability side of things.... -- that alone I don't think would necessarily provide answers to the explainability problem but I think it would be like a refining-oil type step before diving into the L2-compressed feature representations....