The double-precision number probably best represents the generational improvemen...

frankchn · on May 14, 2020

Actually it looks like the double precision number in general GPU usage only went up 25% (Volta did 7.8 TFLOPS). To get the 2.5x number, you need to use FP64 in conjunction with TensorCores, which then gets you 19.5 TFLOPS.

Considering how big the die is (826mm^2 @ TSMC 7nm) and how many transistors there are, they really must have beefed up the TensorCores much more than the general compute units.

randyrand · on May 14, 2020

Wow. TF32 is only 19 bits. Thats some dubious marketing.