and the followup which address the noise impredictability problem.
there are more after that which i believe fail the black pill and miss the point of ml, asicifying the architecture with human priors. But the broader point is to show that rl is not just discovering solutions by chance in random actions. Nature starts with priors, and curiosity is one of the universal policy bootstrapping techniques. (others might be imitation, next state prediction, total nearby replication count)
There is also a paper that deployed ICM on a physical robot and it just played with a ball because it was the only source of novel stimuli, and inadvertantly learned how to operate its arms. There was no other reward in the environment except for curiosity.
It is amazing, and slightly creepy. I think the ICM will be rediscovered later in ML tech.