My typical use cases don't usually involve actual text in the pictures themselve...

My typical use cases don't usually involve actual text in the pictures themselves, but I definitely have seen lots of situations where it gets confused and tries to insert the description of the image into random speech bubbles. I can "usually" fix this most of the time by explicitly stating that there should be no text in the image.

I've gotten some pretty accurate images in one shot though that would've taken me 1000 rolls/mangling to get MJ to do:

- Historical 1980s photograph of the Kool-Aid man busting through the Berlin Wall

I haven't tried DeepFloyd or Ideogram, so maybe I'll give them a shot. From what I've gathered 99% of people who use these just use them to either generate anime, or facial portrait type stuff so most of the models out there are tuned for that kind of thing. DALL-E 3 is the first model I've used (including SDXL, etc) that can actually get pretty close to matching my prompt.