Sad to say I've been dissapointed in DALLE's performance since I got access to it a couple of weeks ago - I think mainly because it was hyped up as the holy grail of text2image ever since it was first announced.
For a long while whenever Midjourney or DALLE-mini or the other models underperformed or failed to match a prompt the common refrain seemed to be "ah, but these are just the smaller version of the real impressive text2image models - surely they'd perform better on this prompt". Honestly, I don't think it performs dramatically better than DALLE-mini or Midjourney - in some cases I even think DALLE-mini outperforms it for whatever reason. Maybe because of filtering applied by OpenAI?
What difference there is seems to be a difference in quality on queries that work well, not a capability to tackle more complex queries. If you try a sentence involving lots of relationships between objects in the scene, DALLE will still generate a mishmash of those objects - it'll just look like a slightly higher quality mishmash than from DALLE-mini. And on queries that it does seem to handle well, there's almost always something off with the scene if you spend more than a moment inspecting it. I think this is why there's such a plethora of stylized and abstract imagery in the examples of DALLE's capabilities - humans are much more forgiving of flaws in those images.
I don't think artists should be afraid of being replaced by text2image models anytime soon. That said, I have gotten access to other large text2image models that claim to outperform DALLE on several metrics, and my experience matched with that claim - images were more detailed and handled relationships in the scene better than DALLE does. So there's clearly a lot of room for improvement left in the space.
For a long while whenever Midjourney or DALLE-mini or the other models underperformed or failed to match a prompt the common refrain seemed to be "ah, but these are just the smaller version of the real impressive text2image models - surely they'd perform better on this prompt". Honestly, I don't think it performs dramatically better than DALLE-mini or Midjourney - in some cases I even think DALLE-mini outperforms it for whatever reason. Maybe because of filtering applied by OpenAI?
What difference there is seems to be a difference in quality on queries that work well, not a capability to tackle more complex queries. If you try a sentence involving lots of relationships between objects in the scene, DALLE will still generate a mishmash of those objects - it'll just look like a slightly higher quality mishmash than from DALLE-mini. And on queries that it does seem to handle well, there's almost always something off with the scene if you spend more than a moment inspecting it. I think this is why there's such a plethora of stylized and abstract imagery in the examples of DALLE's capabilities - humans are much more forgiving of flaws in those images.
I don't think artists should be afraid of being replaced by text2image models anytime soon. That said, I have gotten access to other large text2image models that claim to outperform DALLE on several metrics, and my experience matched with that claim - images were more detailed and handled relationships in the scene better than DALLE does. So there's clearly a lot of room for improvement left in the space.