You're assuming no algorithmic enhancements and missing the currently happening shift from 16bit to 4bit operations which will soon give ML hardware a 4x improvement on top of everything else.
We could be training GPT-4s in our pockets by the end of this decade.
To be fair, they’re also being extremely generous about HW scaling. There’s no way we’re going to see doublings every 18 months for the next 6+ years when we’ve already stopped doing that for the past 5-10.
Have you read the Wikipedia page? Moore’s law started ending ~23 years ago followed by Denmark Scaling ~18 years ago. It’s not necessarily fully stopped because there are other architectural improvements that have been delivered along the way, but we simply have reached nearly the end of the road for scaling this due to a combination of heat dissipation challenges and inability to shrink transistors further. 3D packaging might increase things further but it’s difficult and an area of active research (+ once you do that afaik you’ve unlocked the “last” major architectural improvement). I think the current estimates put the complete end to further HW improvements at ~2050 or so. You can still improve software or build dedicated ASICS/accelerators for expensive software algorithms, but that’s the world pre-Moore which saw most accelerators die off because the exponential growth of CPU compute obviated the need for most of them (except for GPUs). We’re coming back to it with things like Tensor cores. Reversible computing is the way forward after we hit the wall but no one knows how to do this yet.
> But in 2011, Koomey re-examined this data[2] and found that after 2000, the doubling slowed to about once every 2.6 years. This is related to the slowing[3] of Moore's law, the ability to build smaller transistors; and the end around 2005 of Dennard scaling, the ability to build smaller transistors with constant power density.
Wikipedia mis-cited it in the text and should have said "But in 2016". However, the 2016 analysis misses the A11 Bionic through A16 Bionic and M1 and M2 processors -- which instantly blew way past their competitors, breaking the temporary slump around 2016 and reverting us back to the mean slope.
Mainly because now they're analyzing only "supercomputers" and honestly that arena has changed, where quite a bit of the HPC work has moved to the cloud [e.g. Graviton] (not all of it, but a lot), and I don't think they're analyzing TPU pods, which also probably have far better TOPS/watt than traditional supercomputers like the ones on top500.org.
We could be training GPT-4s in our pockets by the end of this decade.