Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is neat! I was fairly surprised when a few that were instantly in my mind were actually correct - Back to the Future, and Wizard of Oz, Ghost Busters, Big Lebowski, to name a few.

And I don't think its simply because of the likelihood of guessing a popular movie, as many of the titles I've never heard of. Although some are a giveaway by their content and focusing on an obvious subject - Ghost Busters has clear ghosts; Hurt Locker has a bunch of bomb-disposal guys. Others like Edward Sissorhands, have no giveaways but just embody the style; I'd love to see the descriptions provided to the AI



> I'd love to see the descriptions provided to the AI

Unless the descriptions were particularly detailed, I would expect that a lot of this comes from the training data and the descriptions are just prompts for the model to recall which film it is.

For instance, Willy Wonka & the Chocolate Factory (1971) is clearly based on this real poster:

https://www.amazon.com/-/es/Póster-Online-Willy-Chocolate-Fa...

This seems less “generate a poster from a description” and more “recall half-memorised posters“.


Due to how VQGAN + CLIP works, it can't memorize its inputs in the way language models like GPT-3 do.

VQGAN does the generation work, CLIP just says if it's good not, improve the latents, repeat. Here's a good technical writeup: https://ljvmiranda921.github.io/notebook/2021/08/08/clip-vqg...

And of course, the most-common VQGAN was trained on ImageNet, which likely doesn't have every movie poster as training data. (it could be in CLIP though)


What do you suppose the mechanism is for the Charlie and the Chocolate Factory image having a golden ticket held aloft by somebody’s right hand, with a person in a purple outfit and top hat? The page says:

> a brief text description of a movie

However apart from the existence of a golden ticket, I wouldn’t expect those details to make it into a brief description of the film. And yet there’s an original poster matching those details that the VQGAN + CLIP generated image seems to draw from.


Even more convincing to me is the face of John Malkovich being on the poster of Being John Malkovich. Unless the description includes a pretty accurate description of his face (hairstyle, gender, age, facial hair, skin color), the model must have encountered his appearance in its training set.


>(hairstyle, gender, age, facial hair, skin color)

That's not enough for reconstructing the face of John Malkovich from text, you need minute facial feature parameters (eye shape, nose shape, eye-nose distances etc etc)


Because he is famous on the Internet, CLIP “knows” what John Malkovich looks like. Or, more accurately: what an image people would label “John Malkovich” feels like.


Wouldn't the most obvious explanation be a description which mentions Willy Wonka's chocolate factory, which doesn't really turn up anywhere in the training data except the original film media?

Star Wars is an interesting example because it appears to include elements lofted directly from the film (bits of stormtroopers body) alongside a princess who definitely isn't Leia. The algorithm might be creating things from scratch at a high level, but the constituent elements are pretty clearly close reproductions of parts of the source material


Would be interesting to see how well the newer CLIP guided diffusion model works. This is a collection of what it generates with the prompt 'mad max alien spacecraft landed in the desert'

https://i.imgur.com/A1sAaev.jpg


That makes much more sense, pretty typical usage of ML as a blunt instrument.

The interesting thing to do, I think, would be to have data set of general images, and use a movie description to pull from those images.


> Others like Edward Sissorhands, have no giveaways but just embody the style

When I looked at that one, I actually "saw" a figure holding distorted scissors imagery.

Trippy stuff!


Me too! I just knew right away.


The same for me - I got 6 correct with some thought, but three of them were instant ideas (less than a second), and those ones were always right: Cast Away, Space Jam, and Monty Python and the Holy Grail.


Wizard of Oz was one I instantly got but didn't know why. I'm not sure the reason...

The matrix I answered as John Wick, which is pretty interesting.

Fear and Loathing was the other I got but I only tried 10.


The Wizard of Oz is very recognizable by the witch, the tornado and the house flying in the tornado.


For me those are not objects upon individual examination except the house - the witch is barely identifiable as a witch; its generally a triangle body with a hat not at all shaped like a witch's typical pointy one. There are cloud forms, but I'd be hard pressed to call them a tornado. Only when I take in the whole image in-general do I see them, but to focus on specific areas, definitely not.

The way people's brains are analyzing and picking apart these images differently is fascinating to me


Yeah after looking at it closer. I was amused as I thought Wizard of Oz before I realize what the shapes were.


It’s the only one I got. :(


I instantly thought "Matrix" when I saw that one and I suspect it's because I never saw John Wick or many films like that since.


I picked "The Terminator" for the Matrix one, which I thought was thematically close.

I'm pretty impressed and amused by this.


My high point was correctly identifying 'Call me by Your Name.'

Thought process: "Making out on a beach? Oh that looks like an Italian flag!"


To me they are too glowy and they have a weird thing that looks like a mix of chromatic aberration and color bleed when the registration is off.


I thought the Back to the future one was Bladerunner... :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: