I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:
Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.
There's another important contender in the space: Hunyuan model from Tencent
My company (Nim) is hosting Hunyuan model, so here's a quick test (first attempt) at "pelican riding a bycicle" via Hunyuan on Nim:
https://nim.video/explore/OGs4EM3MIpW8
I think it's as good, if not better than Sora / Veo
> A whimsical pelican, adorned in oversized sunglasses and a vibrant, patterned scarf, gracefully balances on a vintage bicycle, its sleek feathers glistening in the sunlight. As it pedals joyfully down a scenic coastal path, colorful wildflowers sway gently in the breeze, and azure waves crash rhythmically against the shore. The pelican occasionally flaps its wings, adding a playful touch to its enchanting ride. In the distance, a serene sunset bathes the landscape in warm hues, while seagulls glide gracefully overhead, celebrating this delightful and lighthearted adventure of a pelican enjoying a carefree day on two wheels.
What does it produce for “A pelican riding a bicycle along a coastal path overlooking a harbor”?
Or, what do Sora and Veo produce for your verbose prompt?
If Sora is anything like Dall-e a prompt like "A pelican riding a bicycle along a coastal path overlooking a harbor" will be extended into something like the longer prompt behind the scenes. OpenAI has been augmenting image prompts from day 1.
Hard to say about SORA but the video you shared is most definitely worse than Veo.
The Pelican is doing some weird flying motion, motion blur is hiding a lack of detail, cycle is moving fast so background is blurred etc. I would even say SORA is better because I like the slow-motion and detail but it did do something very non physical.
Veo is clearly the best in this example. It has high detail but also feels the most physically grounded among the examples.
The prompt asks that it flaps its wings. So it's actually really impressive how closely it adheres (including the rest of the little details in the prompt, like the scarf). Definitely the best of the three, in my opinion.
If you'd like to replicate, the sign-up process was very easy and I was easily able to run a single generation attempt. Maybe later when I want to generate video I'll use prompt enhancement. Without it, the video appears to have lost a notion of direction. Most image-generation models I'm aware of do prompt-enhancement. I've seen it on Grok+Flow/Aurora and ChatGPT+DallE.
Prompt
A pelican riding a bicycle along a coastal path overlooking a harbor
Seed
15185546
Resolution
720×480
As long as at least one option is exactly what you asked for throwing variations at you that don't conform to 100% of your prompt seems like it could be useful if it gives the model leeway to improve the output in other aspects.
I am surprised that the top/right one still shows a cut and switch to a difference scene. I would assume that that's something that could be trivially filtered out of the training data, as those discontinuities don't seem to be useful for either these short 6sec video segments or for getting an understanding of the real world.
Well yeah, if you look closely at the example videos on the site, one of them is not quite right either:
> Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. [...]
In the video, the bacon is unceremoniously slapped onto the pancakes, while the prompt sounds like it was intended to be a separate shot, with the bacon still in the pan? Or, alternatively, everything described in the prompt should have been on the table at the same time?
So, yet again: AI produces impressive results, but it rarely does exactly what you wanted it to do...
Technically speaking I'd say your expectation is definitely not laid out in the prompt, so anything goes. Believe me I've had such requirements from users and me as a mere human programmer am never quite sure what they actually want. So I take guesses just like the AI (because simply asking doesn't bring you very far, you must always show something) and take it from there. In other words, if AI works like me, I can pack my stuff already.
This tech is cute but the only viable outcomes are going to be porn and mass produced slop that'll be uninteresting before it's even created. Why even bother?
But I'm also seeing some genuinely creative uses of generative video - stuff I could argue has got some genuine creative validity. I am loathe to dismiss an entire technique because it is mostly used to create garbage.
We'll have to figure out how to solve the slop problem - it was already an issues before AI so maybe this is just hastening the inevevitable.
The real problem is that trust in legacy media hit rock bottom right as we enter the era where we would need such trust the most. Soon enough, nothing you see on video can be believed, but (perhaps more importantly) nothing must be believed either.
Comments like this one are so predictable and incredulous. As if the current state of the art is the final form of this technology. This is just getting started. Big facepalm.
Have you already noticed the trend of image search results for porn containing inferior AI slop porn?
I have. It sucks. The world we're headed for maybe isn't one we actually wind up wanting in the end.
I like the idea of increasingly advanced video models as a technologist, but in practice, I'm noticing slop and I don't like it. Having grown up on porn, when video models are in my hands, the addiction steers me in the direction of only using the the technology to generate it. That's a slot machine so addictive akin to the leap from the dirty magazines of old to the world of internet porn I witnessed growing up. So, porn addiction on steroids. I found it eventually damaging enough to my mental health that I sold my 4090. I'm a lot better off now.
The nerd in me absolutely loves Generative models from a technology perspective, but just like the era of social media before it, it's a double edged sword.
No, I'm providing a personal anecdote that some members of society that do have, or may develop, the same or similar problems are having both the (perceived) good and the bad aspects of those problems seriously magnified by this technology. This can have personal consequences, but also the consequences can affect the lives of others.
Hence, a certain % of the population will be negatively affected by this. I personally personally think it's worth raising awareness of.
I hope they're right. If the technology improves to such a degree that meaningful content can be produced then it could spell global disaster for a number of reasons.
Also I just don't want to live in a world where the things we watch just aren't real. I want to be able to trust what I see, and see the human-ness in it. I'm aware that these things can co-exist, but I'm also becoming increasingly aware that as long as this technology is available and in development, it will be used for deception.
That's exactly what I mean, all of those methods take some human effort, there is a human involved in the process. Now we face a reality that it might take no human effort to do... well, anything. Which is terrifying to me.
I do believe that humans are restless, and even when there is no longer any point to create, and it is far easier to dictate, we still will, just because we are too driven not to.
you know that there is still offline artforms like concerts theaters opera installations etc so i wouldn see it that negative. and we have nearly 100years of music and film we can enjoy. so maybe video is a dying artform for human to act in but there is so much more.
The most predictable comment is yours, especially since you completely missed the point of the original comment which had nothing to do with the video quality.
https://static.simonwillison.net/static/2024/pelicans-on-bic...
Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.
All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/