In RNN, hidden states are to be sequential; in transformers with attention mecha...

		atomicnature on Nov 12, 2023 \| parent \| context \| favorite \| on: GPU Survival Toolkit for the AI age In RNN, hidden states are to be sequential; in transformers with attention mechanism, we break free of the sequential requirement. Transformers are more amenable to parallelism, and make use of GPUs the most (within the context axis, and outside).