Multiplication in decimal systems is inefficient. FFT multiplication works by converting the numbers to and from a more efficient representation where multiplication is O(n) and embarrassingly parallel (convolution vs pointwise multiplication).
If I'm not mistaken addition is also trivial in this representation, and since no other operations are required, the entire computation can be done in the FFT space.
This definitely works in a residue number system, where a chinese remainder transform is used instead of an FFT. (CRT and FFT are algebraically related)
In short, you can create a massive parallel cluster of computers computing parts of the result without interaction, and then in the end combine the results using a single huge pass of FFT.
If I'm not mistaken addition is also trivial in this representation, and since no other operations are required, the entire computation can be done in the FFT space.
This definitely works in a residue number system, where a chinese remainder transform is used instead of an FFT. (CRT and FFT are algebraically related)
In short, you can create a massive parallel cluster of computers computing parts of the result without interaction, and then in the end combine the results using a single huge pass of FFT.