Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>I mean by the same logic the only difference between a diffusion model and a VLM is that you put the spatial transformer on the other end.

Maybe if that was the only different but it's not. There are diffusion models that have nothing to do with transformers or attention or anything like that and where using them for arbitrary sequence prediction is either not possible or highly non-trivial.

Yes, All Neural Network architectures are function approximators but that doesn't they excel equally for all tasks or that you can even use them for anything other than a single task. This era of the transformer where you can simply use a single architecture for NLP, Computer Vision, Robotics, even reinforcement learning is a very new one. Literally anything a bog standard transformer can do is anything GPT can do if Open AI wished.

Like i said, i don't disagree with your broader point. I just don't think this is an instance of it.



It's clear you're missing what point it is that I'm making from these responses, but I'm unsure how to explain it better and you're not really giving me much to work with in terms of seeming to engage with the substance of it, so I think we gotta leave this an impasse for now




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: