Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Generating images and consuming images are very different challenges, which for most models use entirely different systems (ChatGPT constructs prompts to DALL-E for example: https://simonwillison.net/2023/Oct/26/add-a-walrus/ )

Evaluating vision LLMs on their ability to improve their own generation of images doesn't make sense to me. That's why I enjoy torturing new models with my pelican on a bicycle SVG benchmark!



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: