Memory is part of compute. I have no idea why you've separated them.

>How exactly are you going to find enough data to feed these monstrous petabyte scale models? You won't! Your models will stay small and become cheap over time!

People have this idea that we're stretching the limit of tokens to feed to models. Not even close.

This is a 30T token dataset https://www.together.ai/blog/redpajama-data-v2

It's just web scrapped which means all the writing largely inaccessible on the web (swaths of fiction, textbooks, papers etc) and in books are not a part of it. It doesn't contain content in some of the most written languages on earth (e.g Chinese. You can easily get trilions of tokens of Chinese alone)