Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speculating, I would imagine that different prompts submitted along with the image might elicit wildly different behavior in how a multi modal VLM may respond to a given image, potentially affecting the relative tendency to upweight its effective inferences from prior training versus focusing more primarily on the new image itself.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: