The playground is a drag. After accepting being forced to sign up, attach my GitHub, and hand over my email address, I entered the desired prompt and waited with anticipation.. Only to see a black screen and how much it's going to cost per megapixel.
Bummer. After seeing what was generated in the blog post I was excited to try it! Now feeling disappointed.
My go-to test for these tools so far has been the seven horned, seven eyed lamb mentioned in the Book of Revelation. Every tool I've tried has failed at this task.
> A Gary Larsen, "Far Side" comic of a racoon disguising itself by wearing a fedora and long trench coat. The raccoon's face is mostly hidden by the fedora. There are extra paws sticking out of the front of the trench coat from between the buttons, suggesting that the racoon is in fact a stack of several raccoons.
Every human I've ever described this to has no problem picturing what I mean. It's a classic comic trope. AIs still struggle.
A rough rule of thumb is that if a text-generator AI model of some size would struggle to understand your sentence, then an image-generator model a couple of times the size or even bigger would also struggle.
The intelligence just doesn't "fit" in there.
Personally I'm curious to see what would happen if someone burnt $100M of compute time on training a truly enormous image generator model, something the same-ish size as GPT4...
This gets interesting. One approach that I've used with image generation before is to find an image of the sort that I want, and have Dall-e describe it... and then modify the prompt that it provides to be one with the elements that I want.
The image shows an imaginative, whimsical illustration of a character composed of two parts. The upper part features a man dressed in a long, elegant gray coat, wearing a bowler hat and round sunglasses, with a sophisticated white polka-dot ascot tie. His face has a subtle smile. The lower part of the character transitions seamlessly into a smaller figure of a cat, appearing to wear striped pants, with its tail visible. The entire character combines human and feline elements, creating a surreal, anthropomorphic appearance. The illustration is in black and white, emphasizing a stylized, cartoon-like design.
The image captures a whimsical and secretive scene featuring three dwarves stacked in a totem formation, each attempting to conceal their nature under a large brown cloak. The top dwarf has a bright, cheerful expression and blond hair, holding the cloak wide to mimic wings, and is dressed in black armor adorned with teal gems and matching earrings. The middle dwarf displays a fierce expression, sporting a bushy orange beard, and is also clad in similar dark armor with teal embellishments. The bottom dwarf, an older figure with a long white beard, is adorned in a royal dark outfit with gold accents and a small crown, clasping a glowing white orb. This trio of dwarves, each with distinctive fantasy armor, unites in a playful attempt to disguise their stature and nature, adding an element of adventure and mystery to the scene.
Working off of that idea of the totem formation ... "Create an image featuring three children in a totem pole formation that are trying to conceal their nature in a single oversized trench coat."
I suspect the orange beard came from the previous part in the session. But that might be an approach to take in trying to describe it in a way that can be used.
Current generation image generators don’t understand text like instructions as you’re trying to do, describing an object then placing it then setting the scene.
It’s more like a giant telescope of many lenses (the latents from the prompts) and you’re adjusting the lenses to bring a possible reality of many into focus.
Sure. Normally I try a few variants, but "lamb with seven horns" was what I tried when I made that post.
For what it's worth, I've previously asked in the Stable Diffusion Discord server for help generating a "lamb with seven horns and seven eyes" but the members there were also unsuccessful.
I mean, you can use a fork to make whipped cream, but it won't be easy and it's not the right tool for the job. Does that mean that the fork is useless?
I never said it was useless, just that it fails at this specific problem. One of my complaints with many of these image generation tools is that there's not much communication as to what should be expected from them, nor do they explain the areas where they're expected to succeed or fail.
Recently Claude began to allow generation of SVG drawings, and asking it to draw a unicorn and later add extra tails or horns worked correctly.
A fork exists in physical space and it's pretty intuitive to understand what it can do. These models exist within digital space and are incredibly opaque by comparison.
Here's the screenshot [0] that was shared with me. It's obviously pretty basic, but Claude understood the correct location for where the horns and tails should be located. This looks like a clear iterative improvement over older models.
Bummer. After seeing what was generated in the blog post I was excited to try it! Now feeling disappointed.
I was hoping it'd be more like https://play.go.dev.
Good luck.