The double-precision number probably best represents the generational improvement.
The 20x 32-bit floating point improvement is probably achieved by comparing doing full FP32 calculations on the previous generation vs doing TF32 calculations on Ampere. This would not be an apple-to-apple comparison as the TF32 result is less precise.
That said, it is probably not terribly important for deep learning at least, given the success of BF16.
Actually it looks like the double precision number in general GPU usage only went up 25% (Volta did 7.8 TFLOPS). To get the 2.5x number, you need to use FP64 in conjunction with TensorCores, which then gets you 19.5 TFLOPS.
Considering how big the die is (826mm^2 @ TSMC 7nm) and how many transistors there are, they really must have beefed up the TensorCores much more than the general compute units.
The 20x 32-bit floating point improvement is probably achieved by comparing doing full FP32 calculations on the previous generation vs doing TF32 calculations on Ampere. This would not be an apple-to-apple comparison as the TF32 result is less precise.
That said, it is probably not terribly important for deep learning at least, given the success of BF16.