GPUs are immensely complex systems. Look at an API like Vulcan, plus it's shading language, and tell me again it's simple. And that's a low-level interface.
Now add to that the enormous amount of software effort that goes into implementing efficient libraries like cuBLAS, cuDNN, etc. There's a reason other vendors have struggled to compete with NVidia.
Part of Nvidia's advantage comes from building the hardware and software side by side. No one was seriously tackling GPGPU until Nvidia created Cuda, and if you look at the rest of the graphics stack Nvidia is the one driving the big innovations.
GPUs are sufficiently specialized in both interface and problem domain that GPU enhanced software is unlikely to appear without a large vendor driving development, and it would be tough for that vendor to fund application development if there is no lock in on the chips.
which leads to the real question. What business model would enable GPU/AI software development without hardware lock-in? Game development has found a viable business by charging game publishers.
Would you agree that that your observations somewhat imply that a competitive free market is not a fit for all governable domains (and don't mistake governable for government there, we're still talking about shepherding of innovation)?
Early tech investments are risky, but if your competition has tech 10 years more advanced than yours, there is probably no amount of money that would allow you to catch up, surpass, and make enough profits to recover the investment, mainly because you can't buy time, and your competitor won't stop to innovate, they are making a profit and you aren't, etc.
So to me the main realization here is that in tech, if one competitor ends up with tech that's 10 years more advanced than the competition, it is basically a divergence-type of phenomenon. It isn't worth it for the competition to even invest in trying to catch up, and you end up with a monopoly.
This is a good callout, unlike manufacturing the supply chain is almost universally vertically integrated for large software projects. While it's possible to make a kit car that at least some people would buy, most of the big tech companies have reached the point of requiring hundreds of engineers for years to compete.
Caveat that time has shown that the monopolies tend to decay over time for various reasons, the tech world is littered with companies that grew too confident in their monopoly.
The problem with vertically integrated technology is that if a huge advancement appears at the lowest level of the stack that would require a whole re-implementation of the whole stack, a new startup building things from scratch can overthrown a large competitor that would need to "throw" their stack away, or evolve it without breaking backward compatibility, etc.
Once you have put a lot of money into a product, it is very hard to start a new one from scratch and let the old one die.
I think you would need to take a fine tooth comb to the definitions here. I could see a few different options emerge for non-Nvidia software including
- Cloud providers wishing to provide lower CapEx solutions in exchange for increased OpeX and margin.
- Large Nvidia customers forming a foundation to shepherd Open implementations of common technology components
From a free market perspective both forms of transaction would be viable and incentivized, but neither option necessarily leads to an open implementation.
I have been stating similar thing on GPU for a very long time.
The GPU hardware is ( comparatively ) simple.
It is the software that sets GPU vendors apart. For Gaming, that is Drivers. For Compute that is CUDA.
On a relative scale, getting a decent GPU design may have a difficulty of 1, getting a decent Drivers to work well on all existing software is 10, getting the whole ecosystem system around your Drivers / CUDA + Hardware is likely in the range of 50 to 100.
As far as I can tell, under Jensen's leadership, the chance of AMD or even Intel to shake up Nvidia's grasp in this domain is partially zero in the foreseeable future.
That is speaking as an AMD shareholder and really wants AMD to compete.
Letting AMD or Intel port themselves everything that has been developed in CUDA like it was done for Pytorch is not substainable and will always lag behind.
It can only help to create a monopoly on the long term.
As Hip continues to implement more of CUDA, I think we'll see more developers doing it themselves when the barrier to porting is smaller. AMD has a lot of work to do, and I don't know whether they'll succeed or not, but IMO they have the right strategy.
But numpy can be ported. So can pytorch.
I don't think the lock-in is that big of an issue. GPUs do only simple things, but do them fast.