When I looked into this briefly my impression was that it's extremely hard to do...

yeck · on Aug 17, 2023

Yeah, it is harder than other things, but if we can train a model to explain collections of pixels in human language then we might be able to do similar with collections of activations.

I don't know if that is the direction, but just an example that comes to mind easily.

If someone figures out how to do this, I think their models will be far more capable and reliable.