Are there any intrinsic dis/advantages of bidirectional models over causal model...

patelajay285 · 2024-11-14T18:02:34 1731607354

When you train bidirectionally only, you don't get a generative model, that would be the downside. However, you can train on a mixture of causal and bidirectional objectives as some LLM pre-training has done. As far as I am aware, there are no downsides of that, but it is not more common simply because the standard practice has been to train causal only and there just isn't enough funding/attention to go into experimenting on every axis of pre-training (which can be very expensive).

namibj · 2024-11-14T23:23:53 1731626633

No, you can generate with them using diffusion.