LLM inference is inherently a sequential problem. You can't speed it up by doing...

NorwegianDude · on Feb 19, 2024

Technically, I guess you can use speculative execution to speed it up, and in that way take a guess at what the 100th token will be and start on the 101st token at the same time? Though it probably has it's own unforeseen challenges.

Everything is predictable with enough guesses.

_diq5 · on Feb 19, 2024

People are pretty cagey about what they use in production, but yes, speculative sampling can offer massive speedups in inference

Aeolun · on Feb 20, 2024

They’re using several hundred cards here. Clearly there is ‘something’ that can be done in parallel.