In the data center, I think AMD is a lot more viable than most people think. MosaicML recently did a test and were able to swap MI250s with A100s basically seamlessly, within a single training run even, and ran into no issues: https://www.mosaicml.com/blog/amd-mi250
I think where most people have been getting into trouble is with trying to run with unsupported cards (eg, *ALL* of AMD's consumer cards), or wanting to run on Windows. This is obviously a huge fail on AMD's part since anyone who's tried to do anything with any of those consumer cards will just assume the data center cards are the same, but they're quite different. It doesn't help that I've never seen any CDNA2 card on sale/available in retail. How does AMD ever expect to get any adoption when no developers have hardware they can write code to? It's completely mental.
I got really excited until you said all of their consumer cards are out. That's even more infuriating - people have mammoth computing devices laying around and they can't make full use of them, because of drivers.
Not that drivers are simple to make, but still. It's like owning a Ferrari that works perfectly, but you can only drive north.
You can use the WebGPU backend on Tinygrad. It's working well for my test with a Nvidia 960 running inference (Unet 3D). I don't know how well WebGPU is supported on AMD GPUs.
If you have an officially supported card https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h... and are using PyTorch, then you're pretty much good to go. Also, HIPify works pretty well these days.
I think where most people have been getting into trouble is with trying to run with unsupported cards (eg, *ALL* of AMD's consumer cards), or wanting to run on Windows. This is obviously a huge fail on AMD's part since anyone who's tried to do anything with any of those consumer cards will just assume the data center cards are the same, but they're quite different. It doesn't help that I've never seen any CDNA2 card on sale/available in retail. How does AMD ever expect to get any adoption when no developers have hardware they can write code to? It's completely mental.