Not really. LSTM for example would require a recursive element where you update the hidden state and then pass it through the same layer again as you complete the output sequence. In fact the pseudocode shows very nicely how much simpler transformers are. And MLP is already a component in the transformer architecture.
No? You could perfectly plug in an RNN or bidirectional RNN for layer. This is the pseudocode for applying multiple layers. It does not really matter what these layers are, transformer, RNN, convolution, dilated convolutions, etc. The recurrence happens within a layer, not between layers.