Hacker News new | past | comments | ask | show | jobs | submit login

Are there any intrinsic dis/advantages of bidirectional models over causal models for in-context learning? It seems that unidirectional model just have been explored and worked on more.





When you train bidirectionally only, you don't get a generative model, that would be the downside. However, you can train on a mixture of causal and bidirectional objectives as some LLM pre-training has done. As far as I am aware, there are no downsides of that, but it is not more common simply because the standard practice has been to train causal only and there just isn't enough funding/attention to go into experimenting on every axis of pre-training (which can be very expensive).

No, you can generate with them using diffusion.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: