This is neat! I was fairly surprised when a few that were instantly in my mind were actually correct - Back to the Future, and Wizard of Oz, Ghost Busters, Big Lebowski, to name a few.
And I don't think its simply because of the likelihood of guessing a popular movie, as many of the titles I've never heard of. Although some are a giveaway by their content and focusing on an obvious subject - Ghost Busters has clear ghosts; Hurt Locker has a bunch of bomb-disposal guys. Others like Edward Sissorhands, have no giveaways but just embody the style; I'd love to see the descriptions provided to the AI
> I'd love to see the descriptions provided to the AI
Unless the descriptions were particularly detailed, I would expect that a lot of this comes from the training data and the descriptions are just prompts for the model to recall which film it is.
For instance, Willy Wonka & the Chocolate Factory (1971) is clearly based on this real poster:
And of course, the most-common VQGAN was trained on ImageNet, which likely doesn't have every movie poster as training data. (it could be in CLIP though)
What do you suppose the mechanism is for the Charlie and the Chocolate Factory image having a golden ticket held aloft by somebody’s right hand, with a person in a purple outfit and top hat? The page says:
> a brief text description of a movie
However apart from the existence of a golden ticket, I wouldn’t expect those details to make it into a brief description of the film. And yet there’s an original poster matching those details that the VQGAN + CLIP generated image seems to draw from.
Even more convincing to me is the face of John Malkovich being on the poster of Being John Malkovich. Unless the description includes a pretty accurate description of his face (hairstyle, gender, age, facial hair, skin color), the model must have encountered his appearance in its training set.
That's not enough for reconstructing the face of John Malkovich from text, you need minute facial feature parameters (eye shape, nose shape, eye-nose distances etc etc)
Because he is famous on the Internet, CLIP “knows” what John Malkovich looks like. Or, more accurately: what an image people would label “John Malkovich” feels like.
Wouldn't the most obvious explanation be a description which mentions Willy Wonka's chocolate factory, which doesn't really turn up anywhere in the training data except the original film media?
Star Wars is an interesting example because it appears to include elements lofted directly from the film (bits of stormtroopers body) alongside a princess who definitely isn't Leia. The algorithm might be creating things from scratch at a high level, but the constituent elements are pretty clearly close reproductions of parts of the source material
Would be interesting to see how well the newer CLIP guided diffusion model works. This is a collection of what it generates with the prompt 'mad max alien spacecraft landed in the desert'
The same for me - I got 6 correct with some thought, but three of them were instant ideas (less than a second), and those ones were always right: Cast Away, Space Jam, and Monty Python and the Holy Grail.
For me those are not objects upon individual examination except the house - the witch is barely identifiable as a witch; its generally a triangle body with a hat not at all shaped like a witch's typical pointy one. There are cloud forms, but I'd be hard pressed to call them a tornado. Only when I take in the whole image in-general do I see them, but to focus on specific areas, definitely not.
The way people's brains are analyzing and picking apart these images differently is fascinating to me
And I don't think its simply because of the likelihood of guessing a popular movie, as many of the titles I've never heard of. Although some are a giveaway by their content and focusing on an obvious subject - Ghost Busters has clear ghosts; Hurt Locker has a bunch of bomb-disposal guys. Others like Edward Sissorhands, have no giveaways but just embody the style; I'd love to see the descriptions provided to the AI