Not really. LSTM for example would require a recursive element where you update ...

microtonal · on Feb 18, 2024

No? You could perfectly plug in an RNN or bidirectional RNN for layer. This is the pseudocode for applying multiple layers. It does not really matter what these layers are, transformer, RNN, convolution, dilated convolutions, etc. The recurrence happens within a layer, not between layers.

elcomet · on Feb 19, 2024

Exactly. Nothing prevents the list of layers to be the same or different layers.