*> As with many ideas in this article, Alex deserves all the credit. He's been d...

> As with many ideas in this article, Alex deserves all the credit. He's been doing that trick internally at Google for years. ... I think there's I saw an instance of someone doing PCA on conv net activations to make heat maps. We should try to dig that up and cite it.

Not surprised to hear this. It really does seem like an obvious thing to do... yet no one has taken the time to look carefully/methodically at it until now... probably because everyone is too busy with other, newer, flashier things.

> Oh my. That's a super interesting and deep question. Wild wild speculation ahead. Please take everything I say with a big grain of salt.

Thank you. Love it!

> My first comment is that a model can be hard to understand because it's too stupid, in addition to being hard to understand because it's exceeding us ... this intuition is driven by the thought that early superhuman performance for tasks humans are already good at is probably largely about having really crisp, much more statistically precise, versions of our abstractions. Those crisp versions of our abstractions are probably easier to understand than the confused ones.

Yes, that makes sense to me -- but only as long as we're talking about tasks at which humans are already good. I'm not so sure this is/will be the case for tasks at which humans underperform state-of-the-art AI -- such as, for example, learning to recognize subtle patterns in datacenter energy usage necessary for being able signifcantly to lower datacenter energy consumption, or learning to recognize new kinds of Go-game-board patterns that likely confer advantages to a Go player.

> ...I suspect that something that will be important in future thought about this is "alien abstractions" vs "refined abstractions." ... by an "alien abstraction" I mean a feature that I don't have a corresponding idea for. These are much harder for us to deal with. ... ...For visual tasks that humans are already good at, I expect refined abstractions to dominate for a while. In other domains, I have a lot less confidence.

Yes, that makes sense to me too.

Leaving aside the possibility that there might be cognitive tasks beyond the reach of human beings, I have an inkling that we're going to run into more and more of "alien abstractions" or "alien salient ideas" as AI is used for more and more tasks at which human beings do poorly. In particular, I suspect "alien abstractions" will become a serious issue in many narrow domains for which humankind has not invested the numerous man-hours necessary to learn to recognize (let alone name!) a sufficient large number of "refined abstractions."

As an analogy, I imagine the abstractions learned by AI systems in those domains will be as foreign to human beings as the 50+ words Inuit tribes have for different kinds of snow are to you and me -- and probably more so.[0]

> I think there's a lot of nuance about where interpretability will easily scale vs have severe challenges is super subtle. I have a draft essay floating around on the topic. Hopefully I'll get it out there someday.

I can see that, given the computational complexity involved. (I suspect all those new "randomized linear algebra" algorithms will prove useful here.)

Looking forward to reading the article if and when you get around to it. Thank you!

[0] https://www.washingtonpost.com/national/health-science/there...