Are there any intrinsic dis/advantages of bidirectional models over causal models for in-context learning? It seems that unidirectional model just have been explored and worked on more.
When you train bidirectionally only, you don't get a generative model, that would be the downside. However, you can train on a mixture of causal and bidirectional objectives as some LLM pre-training has done. As far as I am aware, there are no downsides of that, but it is not more common simply because the standard practice has been to train causal only and there just isn't enough funding/attention to go into experimenting on every axis of pre-training (which can be very expensive).