Oof. Double-precision is only 2.5x better, which is less impressive than 20x for...

frankchn · on May 14, 2020

The double-precision number probably best represents the generational improvement.

The 20x 32-bit floating point improvement is probably achieved by comparing doing full FP32 calculations on the previous generation vs doing TF32 calculations on Ampere. This would not be an apple-to-apple comparison as the TF32 result is less precise.

That said, it is probably not terribly important for deep learning at least, given the success of BF16.

frankchn · on May 14, 2020

Actually it looks like the double precision number in general GPU usage only went up 25% (Volta did 7.8 TFLOPS). To get the 2.5x number, you need to use FP64 in conjunction with TensorCores, which then gets you 19.5 TFLOPS.

Considering how big the die is (826mm^2 @ TSMC 7nm) and how many transistors there are, they really must have beefed up the TensorCores much more than the general compute units.

randyrand · on May 14, 2020

Wow. TF32 is only 19 bits. Thats some dubious marketing.

raverbashing · on May 14, 2020

I'm curious: why exactly do you need double precision digits? Not dismissing, just wondering what kind of application needs it.

krapht · on May 14, 2020

Physics simulations. There's a rule of thumb that to get an n-bit accurate result after a long chain of calculations, intermediate results should be stored with 2n bits. Often using the full dynamic range of a float is necessary because the magnitude of different physical phenomena varies so wildly.

I guess people do store intermediate results in floats in order to take advantage of GPU acceleration. However, once you do that, you have to be careful in your programming and pay a lot more attention to underflow, overflow, and numerical accuracy. People who write scientific software usually also aren't experts in numerical analysis. Even if the code you write reliably works with floats, the library you use might not be. It's just a huge pain to make sure everything is accurate.

Iolaum · on May 14, 2020

After moving from theoretical high energy physics to data science I m really happy I don't have to care about numerical precision on my computations.

The problem is that numerical errors when solving partial differential equations not only propagate but increase in magnitude during the propagation. If you are not careful you will end up with a 100% wrong answer at the end of a big computation.

barbegal · on May 14, 2020

I've always argued that if you are getting close to having to worry about underflow, overflow etc. then you have an ill-conditioned problem and just increasing the size of your intermediate results won't help you a huge amount because you need more precision from your inputs. There are very few fields where you need more than the 7 decimal digits afforded by floats. Maybe the only exceptions are in astrophysics.

fluffything · on May 14, 2020

When solving linear systems of equations (which arise pretty much everywhere) Krylov subspace methods are usually quite effective because the Krylov subspace is orthonormal.

If your floating-point precision isn't high enough, you'll end up instead with a "subspace" that isn't spanned by orthogonal vectors, and the consequences of this are pretty drastic (requiring n-times the number of iterations to solve the system, you'll never find a solution, etc.). These all happen even if your system of equations has a good condition number, you just need to make the precision low enough.

This is why most people use double precision, and some people use quad precision. For many systems, e.g., if you are using CG as your solver, quad precision can cut the number of iterations by a large factor (2x-4x). These problems are still bandwidth bound, and using quad precision duplicates your memory bandwidth requirements, but if it reduces the number of iterations by 4x, you just halved your time to solution.

evanb · on May 14, 2020

Lattice QCD, especially near the physical point, has poorly-conditioned matrices that one wishes to solve Mx=b for. The state-of-the-art is that the [sparse] matrices can be as large as 4×3×(128^3 × 192) ~ 5e9 on a side. It's not so rare to find legitimately difficult problems in hard sciences.

jabl · on May 14, 2020

Double precision has been the mainstay of scientific computing for decades, and no, it's not because all those scientists are dumb.

godelski · on May 14, 2020

Pretty much any ODE solver you want double. All of CFD (fluid sims) and FEM (mechanics) use double. It can also be a big help in graphics. Basically, if you're doing fine meshing, doubles will help you. There's a balance between your meshing size as well. Your life is a whole lot easier and you'll get better answers if you just use double. Ask literally any computational {physicist,scientist,engineer}. We've all tried without double. Trust me, I'd love to have lower memory constraints.

auggierose · on May 14, 2020

That argument would be wrong. Double can be a big help, for example when performing solid modelling operations on triangle meshes. Best would be actually exact arithmetic, but it being often too slow, Double is often good enough, while Float isn't.

btashton · on May 14, 2020

I thought for this the standard practice is fixed-point. Requires more planning and mental gymnastics but usually much faster and gives you full control of the precision.

Maybe this has changed from my DSP days.

krapht · on May 14, 2020

I am confused. If you used to work in DSP then you know the standard practice involves MATLAB, where the default is double precision. This is also the default datatype in NumPy. Most engineers don't like working in fixed-point unless they have to for other reasons, like moving it onto an FPGA or something.

mng2 · on May 14, 2020

The person you're responding to probably worked with DSP chips, which are generally not floating-point. e.g. Motorola 56000, TigerSHARC, Blackfin.

Keyframe · on May 14, 2020

Why not? Because of interop with other libs/tools? It's not _that_ hard, but I can see the problem if whole workflow isn't like that.

l33tman · on May 14, 2020

You know a cool thing you can do to help fixed point interop between operations in a complex system where you absolutely don't want to accidentally overflow anywhere? You can tack on some bits to the number to control an overall scale of the fixed point number. Let's call it an exponent ;)

Keyframe · on May 14, 2020

Agreed. Let's standardize it :))

btashton · on May 14, 2020

most of it was FPGA work but in some cases DSP processors. Even when you have floating point support it is much slower than if you use fixed point. As for MATLAB, modeled plenty of filters in it for fixed-point math.

krastanov · on May 14, 2020

Not OP, but like many other people I use linear algebra accelerators for scientific computing (in my case, simulating and controlling quantum-mechanical systems). We do need the precision if we want the solutions of our ODEs to converge.

fourier_mode · on May 14, 2020

Titan V100's are pretty good for double precision, needs almost same operation intensity as Xeons to fully utilize them.

QuixoticQuibit · on May 14, 2020

This thing is mainly targeted at AI workloads I assume, so double precision isn’t so interesting.