Generating images and consuming images are very different challenges, which for ...

Generating images and consuming images are very different challenges, which for most models use entirely different systems (ChatGPT constructs prompts to DALL-E for example: https://simonwillison.net/2023/Oct/26/add-a-walrus/ )

Evaluating vision LLMs on their ability to improve their own generation of images doesn't make sense to me. That's why I enjoy torturing new models with my pelican on a bicycle SVG benchmark!