If you haven't tried Ideogram or DeepFloyd, those are better yet at the specific...

gzer0 · on Oct 20, 2023

> Composition is usually boring (very center-aligned and symmetrical) and if you try to control it by using words like "camera", it will put cameras in the picture half the time.

- Be more descriptive and specific with your prompts. I always add "shot from afar" or "wide-angled" or "shot with 85mm lens" - never had an issue of boring composition.

> Typical AI failure modes: "anime" generates a kind of commercial/fanart style that doesn't exist rather than frames from a TV show.

- Again, be more descriptive and specific with your prompts. ie; 'a character in the style of 'Naruto'. Need to specify the show or artists instead of using the broad term "anime".

> You can't generate things if they're hard to explain in English, like "a keyboard whose keys are house keys".

- One more time, you just need to be more specific. do: 'a computer keyboard, but instead of regular keys, replace each one with a metal house key.'

astrange · on Oct 20, 2023

DALLE3 is explicitly designed around not having to do this; the point is that it'll write the detailed prompt for you in four different ways so you get more variation.

"Wide angle" and "fisheye" do work, but "lens" is dangerous; anything that can be read as an object will tend to cause that object to appear instead of being used metaphorically. (Though trying a bit more, that doesn't usually happen, if only because it rewrites the prompt to not mention it.)

> - Again, be more descriptive and specific with your prompts. ie; 'a character in the style of 'Naruto'. Need to specify the show or artists instead of using the broad term "anime".

Explicitly against the rules. It'll block you if you try to use any real person's name or anything it thinks is copyrighted. You can paraphrase though, or argue with it which weirdly sometimes works.

> One more time, you just need to be more specific. do: 'a computer keyboard, but instead of regular keys, replace each one with a metal house key.'

Tried it, doesn't work well. If it gets to dalle it's not smart enough to reliably do "instead of" (or generally anything that's a "deletion"). It'll just put in all three concepts.

https://imgur.com/a/l8RS1Lu

Synonyms can help if there is one in English. Emoji do interesting things but can't tell how well they work.

vunderba · on Oct 20, 2023

My typical use cases don't usually involve actual text in the pictures themselves, but I definitely have seen lots of situations where it gets confused and tries to insert the description of the image into random speech bubbles. I can "usually" fix this most of the time by explicitly stating that there should be no text in the image.

I've gotten some pretty accurate images in one shot though that would've taken me 1000 rolls/mangling to get MJ to do:

- Historical 1980s photograph of the Kool-Aid man busting through the Berlin Wall

I haven't tried DeepFloyd or Ideogram, so maybe I'll give them a shot. From what I've gathered 99% of people who use these just use them to either generate anime, or facial portrait type stuff so most of the models out there are tuned for that kind of thing. DALL-E 3 is the first model I've used (including SDXL, etc) that can actually get pretty close to matching my prompt.

GaggiX · on Oct 20, 2023

>Typical AI failure modes: "anime" generates a kind of commercial/fanart style that doesn't exist rather than frames from a TV show.

People usually want anime artwork, not frames from a show. If you want that take the effort to write it.

astrange · on Oct 20, 2023

I mention this one because Dalle2 actually does work if you say "anime screenshot" (it knows studio names too.) 3 does not understand "screenshot" (makes pictures of people watching something on a TV) and blocks you if you try to reference anything copyrighted, so no "N64 game" either. Time periods like "80s anime" kinda work though.

But the main point is that it makes it online English centric. That style isn't called "anime" in Chinese/Japanese, but if you try to prompt in those languages it translates it first. Basically, if you could control it with images as well as text that'd help.

crustaceansoup · on Oct 20, 2023

I can fix most issues by characterizing the issues to ChatGPT. I'll tell it what DALL-E's limitations are and why its prompts don't work and it (usually) "listens" and fixes things. It "knows" what a prompt is, so you could probably just tell it that it needs to be explicit with characters each time and it will be.

I've been just playing with it and generating silly images, I find working with the LLM to generate prompts is really entertaining and can go in directions I wouldn't. I can just ask for "software development memes", then "make some more but reminiscent of famous memes", and then maybe ask it to "create some images blending game development with cosmic horror", then "I like prompt #2, create some variants of that but in the style of Junji Ito" and on and on.

astrange · on Oct 20, 2023

Yeah, it's good at that. It's very good at combining concepts it is familiar with and can do a decent job at comics.

I've seen other people do comics with "meme characters" like Pepe in, but I don't know the trick; it usually complains about proper names and when I tried a few paraphrases it inexplicably produced Reddit ragefaces.

thom · on Oct 20, 2023

I recognise some of what you're saying here. I've used DALL-E 3 in anger recently and while the results have really impressed me, every time I've tried to actually tweak and improve something I've ended up getting further and further away from what I want.

bbstats · on Oct 20, 2023

Quick test on Dall-E 3 (bing.com/create) vs ideogram, and ideogram is not even close in terms of quality.

astrange · on Oct 20, 2023

Image quality no, it's just especially good at being able to spell words compared to Midjourney.

Deepfloyd is a local model sponsored by Stability that's older than SDXL.

thequadehunter · on Oct 20, 2023

Is there evidence that bing creator does this? I remember when it first rolled out there the generations were lower quality but didn't stray as far as Chatgpt. It's just anecdotal though, so might not be true.