AlphaGo and AlphaZero were able to achieve superhuman performance due to the availability of perfect simulators for the game of Go. There is no such simulator for the real world we live in. (Although pure LLMs sorta learn a rough, abstract representation of the world as perceived by humans.) Sora is an attempt to build such a simulator using deep learning.
This actually affirms my comment above.
“Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.”
`since it is trained to simulate the real world, as opposed to imitate the pixels.`
It's not that its learning a model of the world instead of imitating pixels - the world model is just a necessary emergent phenomenon from the pixel imitation. It's still really impressive and very useful, but it's still 'pixel imitation'
This actually affirms my comment above.
https://openai.com/research/video-generation-models-as-world...What part of my argument do you disagree about?