Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Certainly they retain not just information but compute capacity in a way that other expensive transformations don’t. I’m hard pressed to think of another example where compute spend now can be banked and used to reduce compute requirements later. Rainbow tables maybe? But they’re much less general purpose.


HashLife seems like a scale free memoizer, https://en.wikipedia.org/wiki/Hashlife

How Well Can DeepMind's AI Learn Physics? https://www.youtube.com/watch?v=2Bw5f4vYL98 https://arxiv.org/abs/2002.09405 https://sites.google.com/corp/view/learning-to-simulate/home

Discovering Symbolic Models from Deep Learning (Physics) https://www.youtube.com/watch?v=HKJB0Bjo6tQ

Scientific Machine Learning: Physics-Informed Neural Networks with Craig Gin https://www.youtube.com/watch?v=RTPo6KgpvBA

Steve Brunton's channel is even more mind blowing than Two Minute Papers, https://www.youtube.com/@Eigensteve

Not only can we bank computation, speed up physical simulations by 100x but I also saw some work on being able to design outcomes in GoL (game of life).

There was a paper on using a NN to build or predict arbitrary patters in GoL, but I can't find it right now.


I don't know abut NN prediction, but apparently you can bootstrap anything* with strategically placed 15 gliders.

https://btm.qva.mybluehost.me/building-arbitrary-life-patter...


It would be interesting to see an analysis of this. I see your point - otoh is there a reason to believe that more computation is being "banked" than say matrix inversion, or other optimizations that aren't gradient descent based?

The large datasets involved let us usefully (for some value of useful) bank lots of compute, but it's not obvious to me that it's done particularly efficiently compared to other things you might precompute.

For converged model training, training is often quite inefficient because the weight updates decay to zero and most epochs are having a very small individual effect. I think for e.g. stable diffusion, they dont train to anywhere near convergence so weight updates have a bigger average effect. Not sure if that applies to llms




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: