This gets interesting. One approach that I've used with image generation before is to find an image of the sort that I want, and have Dall-e describe it... and then modify the prompt that it provides to be one with the elements that I want.
The image shows an imaginative, whimsical illustration of a character composed of two parts. The upper part features a man dressed in a long, elegant gray coat, wearing a bowler hat and round sunglasses, with a sophisticated white polka-dot ascot tie. His face has a subtle smile. The lower part of the character transitions seamlessly into a smaller figure of a cat, appearing to wear striped pants, with its tail visible. The entire character combines human and feline elements, creating a surreal, anthropomorphic appearance. The illustration is in black and white, emphasizing a stylized, cartoon-like design.
The image captures a whimsical and secretive scene featuring three dwarves stacked in a totem formation, each attempting to conceal their nature under a large brown cloak. The top dwarf has a bright, cheerful expression and blond hair, holding the cloak wide to mimic wings, and is dressed in black armor adorned with teal gems and matching earrings. The middle dwarf displays a fierce expression, sporting a bushy orange beard, and is also clad in similar dark armor with teal embellishments. The bottom dwarf, an older figure with a long white beard, is adorned in a royal dark outfit with gold accents and a small crown, clasping a glowing white orb. This trio of dwarves, each with distinctive fantasy armor, unites in a playful attempt to disguise their stature and nature, adding an element of adventure and mystery to the scene.
Working off of that idea of the totem formation ... "Create an image featuring three children in a totem pole formation that are trying to conceal their nature in a single oversized trench coat."
I suspect the orange beard came from the previous part in the session. But that might be an approach to take in trying to describe it in a way that can be used.
Current generation image generators don’t understand text like instructions as you’re trying to do, describing an object then placing it then setting the scene.
It’s more like a giant telescope of many lenses (the latents from the prompts) and you’re adjusting the lenses to bring a possible reality of many into focus.
The first attempt at this based on https://reductress.com/post/my-boyfriends-are-always-two-kid... ... really misunderstood the image. This may also be part of the problem.
I then went to the image from https://www.reddit.com/r/DnD/comments/c6fdw4/oc_introducing_...And that provided:
Working off of that idea of the totem formation ... "Create an image featuring three children in a totem pole formation that are trying to conceal their nature in a single oversized trench coat."That produced https://imgur.com/a/Of9FsJl
I suspect the orange beard came from the previous part in the session. But that might be an approach to take in trying to describe it in a way that can be used.