> Sora was impressive because the clips were long and had lots of rapid movement
Sora videos ran at 1 beat per second, so everything in the image moved at the same beat and often too slow or too fast to keep the pace.
It is very obvious when you inspect the images and notice that there are keyframes at every whole second mark and everything on the screen suddenly goes in their next animation step.
That really limits the kind of videos you can generate.
It also needs to separate animation steps for different objects so that objects can keep different speeds. It isn't trivial at all to go from having a keyframe for the whole picture to having separate for separate parts, you need to retrain the whole thing from the ground up and the results will be way worse until you figure out a way to train that.
My point is that it isn't obvious at all that Soras way actually is closer to the end goal, it might look better today to have those 1 second beats for every video but where do you go from there?
The best case scenario would probably being able to generate "layers" at a time. That would give more creative control over the outcome, but I have no idea how you would do it.
Sora videos ran at 1 beat per second, so everything in the image moved at the same beat and often too slow or too fast to keep the pace.
It is very obvious when you inspect the images and notice that there are keyframes at every whole second mark and everything on the screen suddenly goes in their next animation step.
That really limits the kind of videos you can generate.