Extending the tensor Ops to FP64 is an interesting, if not surprising, design ch...

blopeur · on May 14, 2020

I am suspecting that this is specifically targeted at the HPC market as the lack of FP64 has always been a hindrance to HPC deployment.

You have to remember that the HPC market is $35B today. HPE makes $3B a year alone from that, maybe more with Cray acquisition.

So it's no surprise that NVIDIA wants to position themselves on that market.

Plus, you have too look at the long game with MLX acquisition ( heavy player in the HPC market) and Cumulus. I wouldn't be surprised to see NVIDIA trying to bypass Intel/AMD completely and offer Direct to Interconnect device. Rather than the hybrid CPU + GPU box.

godelski · on May 14, 2020

Remember too that AMD and Intel won the contracts for Aurora, Frontier, and El Cap (the three exascale machines for the DOE). I imagine MLX is a big part of getting the next contracts as well as seeing that a lot of these projects are IO bound, not compute. If you can bring supercomputing like abilities to datacenters or AI labs, that'd be a huge advantage. If you could easily split a huge model across 64 GPUs and train like it was on a single node, this would change the space.

fluffything · on May 14, 2020

> Remember too that AMD and Intel won the contracts for Aurora, Frontier, and El Cap (the three exascale machines for the DOE).

This isn't very surprising I think. AMD cards often have higher FLOPs than NVIDIA's, and I can imagine that they run HPL really well.

I can't wait to try these systems. I want to see what the OpenMP performance there looks like for normal applications.

erk__ · on May 14, 2020

Lately they have also been working quite a bit with POWER, quite a few of the big HPC rusns on the POWER/Mellanox/NVIDIA combo. So that could also be something they look into more, instead of going with x86.