AMD Instinct MI200 Adopted for Large-Scale AI Training in Microsoft Azure

dragontamer · on May 26, 2022

The "AI Training" bit seems like a little bit of a marketing trick. MI200 certainly has 16-bit matrix-multiplication ops associated with deep-learning, but the MI200's benchmarks had stand-out performance of 64-bit double-precision floating point code.

128GB of RAM is also pretty big, assuming its the MI250x (though I'm not seeing any details on which MI200-series GPU is actually coming to Azure). Albeit its a 2x64GB configuration across 2xChiplets IIRC that looks like 2-different GPUs at the software level (kinda like NUMA but with GPUs).

Still, this is one of AMD's biggest "cloud" wins for its GPU-line. So that's exciting in of itself.