Probably even more closed than ever. They tend to become more and more restricti...

wegs · on May 14, 2020

Yeah. That's been my general problem with adopting NVidia for anything. They make good hardware, but there's a lot of lock-in, and not a lot of transparency. That introduces business risk.

I'm not in a position where I need GPGPU, but if there wasn't that risk, and generally there were mature, open standards, I'd definitely use it. The major breakpoint would be when libraries like Numpy do it natively, and better yet, when Python can fork out list comprehensions to a GPU. I think at that point, the flood gates will open up, and NVidia's marketshare will explode from specialized applications to everywhere.

Intel stumbled into it by accident, but got it right with x86. Define an open(ish) standard, and produce superior chips to that standard. Without AMD, Cyrix, Via, and the other knock-offs, there would be no Intel at this point.

Intel keeps getting it right with numerical libraries. They're open. They work well. They work on AMD. But because Intel is building them, Intel has that slight bit of advantage. If Intel's open libraries are even 5% better on Intel, that's a huge market edge.

fluffything · on May 14, 2020

> They make good hardware, but there's a lot of lock-in, and not a lot of transparency.

This sounds like you'd like NVIDIA to open-source all their software. I see this type of request a lot, but I don't see it happening.

NVIDIA's main competitive advantage over AMD and Intel is its software stack. AMD could release a 2x powerful GPGPU tomorrow for half the price and most current NVIDIA users wouldn't care because what good is that if you can't program it? AMD software offer is just poor, of course they open-source everything, they don't make any software worth buying.

ARM and Intel make great software (the Intel MKL, Intel SVML, ... libraries, icc, ifort, ... compiler), and it doesn't open-source any of that either for the same reasons as NVIDIA.

Intel and NVIDIA employ a lot of people to develop their software stacks. These people aren't probably very cheap. AMD strategy is to save a lot of money in software development, maybe hoping that the open-source communities or Intel and NVIDIA will do it for free.

I also see these requests that Intel and NVIDIA should open-source everything together with the explanation that "I need this because I want to buy AMD stuff". That, right there, is the reason why they don't do it.

You want to know why NVIDIA has 99% of the Cloud GPGPU hardware market and AMD 1%? If you think 10.000$ for a V100 is expensive, do the math on how much does an AMD MI50 costs: 5000$ for the hardware, and then a team of X >100k$ engineers (how much do you think AI GPGPU engineers cost?) working for N years just to play catch on the part of the software stack that NVIDIA gives you with a V100 for free. That goes into multiple million dollars more expensive really quickly.

adev_ · on May 14, 2020

> AMD could release a 2x powerful GPGPU tomorrow for half the price and most current NVIDIA users wouldn't care because what good is that if you can't program it?.

Correction: Nobody will be able to use the AMD hardware (outside of computer graphics) because everybody has been locked-in with CUDA on Nvidia. They can not even change even if they want to: it is pure madness to reprogram an entire GPGPU software stack every 2 years just to change your hardware provider.

And I think it will remain like that until NVidia get sued for anti-trust.

> ARM and Intel make great software [..] doesn't open-source any of that either for the same reasons as NVIDIA.

That's propaganda and it's wrong.

Intel and ARM contribute a lot to OSS. Most of the software they release nowadays is Open Source. This includes compiler support, drivers, libraries and entire dev environment: mkl-dnn, TBB, BLIS, ISPC, "One", mbedTLS.... ARM has even an entire foundation only to contribute to OSS (https://www.linaro.org/) .

Near to that, NVidia does close to nothing.

There is no justification to NVidia's attitude related to OSS. It reminds me the one of Microsoft at its darkest days.

The only excuse I can see to this attitude is greed.

I hope at least they do not contaminate Mellanox with their toxic policies. Mellanox was an example of successful Open Source contributor/company (up to now) with OFabric (https://www.openfabrics.org/). It would be dramatic if this disappear.

Polylactic_acid · on May 14, 2020

Amd doesn't even have software for GPGPU on some of their cards. I have an rx5700xt and I cant use it for anything but gaming because ROCm doesn't support navi cards, a whole year after its release.

csdreamer7 · on May 14, 2020

As a 5700 owner, I agree.

It gets even worse. There was recently a regression in the 5.4, 5.5, and 5.6 kernels that hit me hard for a week or so on Manjaro last month. System just decided to lock up or restart. Thought the graphics card had died when it happened once on Windows. Working fine now-these drivers have been out for 10 months now.

Even worse, AMD has locked down the releases of some of their 'GPUOpen' software.

https://www.phoronix.com/scan.php?page=news_item&px=Radeon-R...

https://www.phoronix.com/scan.php?page=news_item&px=GPUOpen-...

I did not expect the second one to be open source; just not on their GPUOpen website.

I did expect the first one to 'stay' open source. Not to be made proprietary on their 'GPUOpen' website.

I am definitely keeping an eye on Intel graphics now.

ksec · on May 15, 2020

I think at this point AMD wants anything Compute to concentrate on CDNA, and graphics remain on RDNA.

BeetleB · on May 15, 2020

>> ARM and Intel make great software [..] doesn't open-source any of that either for the same reasons as NVIDIA.

> That's propaganda and it's wrong.

Very convenient of you to have omitted what was in the square brackets:

> Intel MKL, Intel SVML, ... libraries, icc, ifort, ... compiler

Show me the open source MKL, Intel SVML, icc and ifort.

Some (all?) of it may be free, but it's not open source.

rodburns · on May 15, 2020

I don't necessarily have an opinion either way in this discussion but wanted to point out that Intel's latest MKL library does seem to be done as an open source project https://github.com/oneapi-src/oneMKL

amelius · on May 14, 2020

> Nobody will be able to use the AMD hardware (outside of computer graphics) because everybody has been locked-in with CUDA on Nvidia.

But numpy can be ported. So can pytorch.

I don't think the lock-in is that big of an issue. GPUs do only simple things, but do them fast.

david-gpu · on May 14, 2020

> GPUs do only simple things, but do them fast

GPUs are immensely complex systems. Look at an API like Vulcan, plus it's shading language, and tell me again it's simple. And that's a low-level interface.

Now add to that the enormous amount of software effort that goes into implementing efficient libraries like cuBLAS, cuDNN, etc. There's a reason other vendors have struggled to compete with NVidia.

Disclaimer: currently employed at NVidia.

lumost · on May 14, 2020

Part of Nvidia's advantage comes from building the hardware and software side by side. No one was seriously tackling GPGPU until Nvidia created Cuda, and if you look at the rest of the graphics stack Nvidia is the one driving the big innovations.

GPUs are sufficiently specialized in both interface and problem domain that GPU enhanced software is unlikely to appear without a large vendor driving development, and it would be tough for that vendor to fund application development if there is no lock in on the chips.

which leads to the real question. What business model would enable GPU/AI software development without hardware lock-in? Game development has found a viable business by charging game publishers.

diffrinse · on May 14, 2020

Would you agree that that your observations somewhat imply that a competitive free market is not a fit for all governable domains (and don't mistake governable for government there, we're still talking about shepherding of innovation)?

fluffything · on May 15, 2020

Early tech investments are risky, but if your competition has tech 10 years more advanced than yours, there is probably no amount of money that would allow you to catch up, surpass, and make enough profits to recover the investment, mainly because you can't buy time, and your competitor won't stop to innovate, they are making a profit and you aren't, etc.

So to me the main realization here is that in tech, if one competitor ends up with tech that's 10 years more advanced than the competition, it is basically a divergence-type of phenomenon. It isn't worth it for the competition to even invest in trying to catch up, and you end up with a monopoly.

lumost · on May 15, 2020

This is a good callout, unlike manufacturing the supply chain is almost universally vertically integrated for large software projects. While it's possible to make a kit car that at least some people would buy, most of the big tech companies have reached the point of requiring hundreds of engineers for years to compete.

Caveat that time has shown that the monopolies tend to decay over time for various reasons, the tech world is littered with companies that grew too confident in their monopoly.

- Cisco - Microsoft Windows - IBM

etc.

fluffything · on May 18, 2020

The problem with vertically integrated technology is that if a huge advancement appears at the lowest level of the stack that would require a whole re-implementation of the whole stack, a new startup building things from scratch can overthrown a large competitor that would need to "throw" their stack away, or evolve it without breaking backward compatibility, etc.

Once you have put a lot of money into a product, it is very hard to start a new one from scratch and let the old one die.

lumost · on May 15, 2020

I think you would need to take a fine tooth comb to the definitions here. I could see a few different options emerge for non-Nvidia software including

- Cloud providers wishing to provide lower CapEx solutions in exchange for increased OpeX and margin. - Large Nvidia customers forming a foundation to shepherd Open implementations of common technology components

From a free market perspective both forms of transaction would be viable and incentivized, but neither option necessarily leads to an open implementation.

ksec · on May 15, 2020

I have been stating similar thing on GPU for a very long time.

The GPU hardware is ( comparatively ) simple.

It is the software that sets GPU vendors apart. For Gaming, that is Drivers. For Compute that is CUDA.

On a relative scale, getting a decent GPU design may have a difficulty of 1, getting a decent Drivers to work well on all existing software is 10, getting the whole ecosystem system around your Drivers / CUDA + Hardware is likely in the range of 50 to 100.

As far as I can tell, under Jensen's leadership, the chance of AMD or even Intel to shake up Nvidia's grasp in this domain is partially zero in the foreseeable future.

That is speaking as an AMD shareholder and really wants AMD to compete.

adev_ · on May 14, 2020

> But numpy can be ported. So can pytorch.

Letting AMD or Intel port themselves everything that has been developed in CUDA like it was done for Pytorch is not substainable and will always lag behind.

It can only help to create a monopoly on the long term.

slavik81 · on May 15, 2020

As Hip continues to implement more of CUDA, I think we'll see more developers doing it themselves when the barrier to porting is smaller. AMD has a lot of work to do, and I don't know whether they'll succeed or not, but IMO they have the right strategy.

gnufx · on May 14, 2020

Intel don't release BLIS, though there is some Intel contribution. Substitute libxsmm, which originally beat MKL.

fluffything · on May 14, 2020

> Correction: Nobody will be able to use the AMD hardware (outside of computer graphics) because everybody has been locked-in with CUDA on Nvidia.

NVIDA open-sourced their CUDA implementation to the LLVM project 5 years ago, which is why clang can compile CUDA today, and why Intel and PGI have clang forks compiling CUDA to multi-threaded and vectorized x86-64 using OpenMP.

That you can't compile CUDA to AMD GPUs isn't NVIDIA's fault, it's AMD, for deciding to pursue OpenCL first, then HSA, and now HIP.

adev_ · on May 14, 2020

> Do you work for AMD

I do not. And I use NVidia hardware regularly for GPGPU. But I hate fanboyism.

> NVIDA open-source their CUDA implementation to the LLVM project 5 years ago

Correction: Google developped an internal CUDA implementation for their own need based on LLVM that Nvidia barely supported it for their own need afterwards.

Nothing is "stable" nor "branded" in this work.... Consequently, 99% of public Open Source CUDA-using software still compile ONLY with the CUDA proprietary toolchain ONLY on NVidia hardware. And this is not going to change anything soon.

> one from PGI, that compile CUDA to multi-threaded x86-64 code using OpenMP.

The PGI compiler is proprietary and now property of NVidia. It was previously properitary and independant but mainly used for its GPGPU capability through OpenACC. OpenACC backend targets directly the nvidiaptx (proprietary) format. Nothing related with CUDA.

> Intel being the main vendor pushing for a parallel STL in the C++ standard

That's wrong again.

Most of the work done for the parallel STL and by the C++ committee originate from work from HPX and the STELLAR Group (http://stellar-group.org/libraries/hpx/).

They are pretty smart people and deserve at least respect and parent-ship for what they have done.

More information from Hermut Kaiser (Very Nice Guy btw) here (https://www.youtube.com/watch?v=6Z3_qaFYF84).

They have been the precursor of the idea of parallel "algorithms" in the STL and the concept of "Execution policy" you have in C++17 comes from them.

To the defense of Intel (and up to my knowledge) they have provided the first OSS implementation for compilers for it.

TomVDB · on May 14, 2020

> But I hate fanboyism.

"The only excuse I can see to this attitude is greed" sounds pretty fanboyish to me. :-)

I've never understood why Microsoft, or Adobe, or Autodesk, or Synopsys, or Cadence or any other pure software company is allowed to charge as much as the market will bear for their products, often more per year than Nvidia's hardware, but when a company makes software that runs on dedicated hardware, it's called greed. I don't think it's an exaggeration when I say that, for many laptops with a Microsoft Office 365 license, you pay more over the lifetime of the laptop for the software license than for the hardware itself. And it's definitely true for most workstation software.

When you use Photoshop for your creative work, you lock your design IP to Adobe's Creative Suite. When you use CUDA to create your own compute IP, you lock yourself to Nvidia's hardware.

In both cases, you're going to pay an external party. In both cases, you decide that this money provides enough value to be worth paying for.

fluffything · on May 14, 2020

> Correction: Google developped an internal CUDA implementation for their own need based on LLVM that Nvidia barely supported it for their own need afterwards.

This is widely inaccurate.

While Google did developed a PTX backend for LLVM, the student that worked on that as part of a GSOC got later hired by NVIDIA, and ended up contributing the current NVPTX backend that clang uses today. The PTX backend that Google contributed was removed some time later.

> Nothing is "stable" nor "branded" in this work.

This is false. The NV part of the backend name (NVPTX) literally brands this backend as NVIDIAs PTX backend, in strong contrast with the other PTX backend that LLVM used to have (it actually had both for a while).

> OpenACC backend targets directly the nvidiaptx (proprietary) format.

This is false. Source: I've used the PGI compiler on some Fortran code, and you can mix OpenACC with CUDA Fortran just fine, and compile to x86-64 using OpenMP to just target x86 CPUs. No NVIDIA hardware involved.

> That's wrong again. > > Most of the work done for the parallel STL and by the C++ committee originate from work from HPX and the STELLAR Group

This is also widely inaccurate. The Parallel STL work actually originated with the GCC parallel STL, the Intel TBB, and NVIDIA Thrust libraries [0]. The author of Thrust was the Editor of the Parallelism TS, and is the chair of the Parallelism SG. The members of the STELLAR group that worked on HPX started collaborating more actively with ISO once they started working at NVIDIA after their PhDs. One of them chairs the C++ library evolution working group. The Concurrency working group is also chaired by NVIDIA (by the other nvidia author of the original parallelism TS.

AMD is nowhere to be found in this type of work.

[0] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n372...

adev_ · on May 14, 2020

> While Google did developed a PTX backend for LLVM, the student that worked on that as part of a GSOC got later hired by NVIDIA, and ended up contributing the current NVPTX backend that clang uses today.

You more or less reformalized what I said. It might become used one day behind a proprietary blob, rebranded blob of NVidia, but fact is that today, close to nobody use it for production in the wild and it is not even supported officially.

> This is false. The NV part of the backend name (NVPTX) literally brands this backend as NVIDIAs PTX backend.

It does not mean it's stable or used. I do not now a single major GPGPU software in existence that ever used it in an official distribution. Like I said.

> CUDA Fortran just fine

CUDA fortran, yes you said it, CUDA fortran. The rest is OpenACC.

> The Parallel STL work actually originated with the GCC parallel STL, the Intel TBB, and NVIDIA Thrust libraries

My apologies for that. I was ignoring this precedent work.

> AMD is nowhere to be found in this type of work.

I do not think I ever said anything about AMD.

fluffything · on May 15, 2020

> CUDA fortran, yes you said it, CUDA fortran. The rest is OpenACC.

You can also mix C, OpenACC, and CUDA C, and compile to x86-64. So I'm really not sure about what point you are trying to make here.

You were claiming that OpenACC and CUDA only runs on nvidia's hardware, yet I suppose you now agree that this isn't true I guess.

I do agree that PGI is still nvidia owned, but there are other compilers that do what PGI does.

adev_ · on May 15, 2020

> You were claiming that OpenACC and CUDA only runs on nvidia's hardware, yet I suppose you now agree that this isn't true I guess.

I do not think I ever said that OpenACC runs only on NVidia hardware. However CUDA I still affirm that CUDA runs only on NVidia hardware yes. For anything else, it is based on code converter in best case.

adev_ · on May 14, 2020

> That you can't compile CUDA to AMD GPUs isn't NVIDIA's fault, it's AMD, for deciding to pursue OpenCL first, then HSA, and now HIP.

Using a branded & under patent concurrent proprietary technology and copying its API for your own implementation is Maddness that will lead you for sure in front of a court.

It seems that even Google understood that the hard way (https://en.wikipedia.org/wiki/Google_v._Oracle_America)

fluffything · on May 14, 2020

How come? There is a CUDA C++ and CUDA C toolchains available under a MIT license, large part s of which are contributed by NVIDIA.

How can they sue you for using something that they give you with a license that says "we allow you to do whatever you want with it" ?

Teknoman117 · on May 14, 2020

the MIT license doesn't have an express patent grant. If Nvidia has a patent on some technology used by the open source code, they could sue you for patent infringement if you use it in a way that displeases them. What they can't do is sue you for copyright infringement.

monocasa · on May 14, 2020

Google v Oracle is still unsettled.

Most other legal precedent was that it was fine to clone an API.

adev_ · on May 14, 2020

> Most other legal precedent was that it was fine to clone an API.

CUDA is more than an API. It is a technology under copyright and very likely patented too. Even the API itself contains multiple reference to "CUDA" in function calls and variable name.

monocasa · on May 14, 2020

None of that protects it from being cloned under previous 9th circuit precedent except maybe patents, but I'm not aware of any patents that'd protect another against CUDA implementation.

mambru · on May 14, 2020

>Intel and PGI have clang forks compiling CUDA to multi-threaded and vectorized x86-64 using OpenMP.

Where are these forks?

pjmlp · on May 15, 2020

For Intel, https://software.intel.com/content/www/us/en/develop/tools/o...

fluffything · on May 15, 2020

For PGI, all pgi compilers can do this, just pick x86-64 as the target. There are also other forks online (just search for LLVM, CUDA, x86 as keywords), some university groups have their forks on github, where they compile CUDA to x.

fermienrico · on May 14, 2020

People who are into RISC-V and other side projects/open stacks obviously have not worked on mission critical problems.

When you have a Jet engine hoisted up for a test rig, and something fails in your DSP library, you don't hesitate to call Matlab engineering support to help on within next 30 mins. Try that with some python library. People give a lot of flak to Matlab for being closed source but there is a reason they exist. Not for building a stupid toy project, but for real things where big $$$ is on the line. Python is also used in production everywhere, but if your application is a niche one and using PyVISA library to connect to some DSP hardware that you git cloned is not very "production" ready. You need solid deps.

Don't get me wrong - open source software runs in prod all the time - PostgreSQL/Linux, etc. The smaller the application domain (specific DSP libraries or analysis stacks for wind turbines and such), the lower the availability of high quality open source software (and support).

My point is that reality hits you hard when it is anything where a lot of $$$ or people's time depend on it. Don't blame their engineers for using closed source tools.

pcwalton · on May 14, 2020

> People who are into RISC-V and other side projects/open stacks obviously have not worked on mission critical problems.

"People who are into RISC-V" nowadays includes folks like Chris Lattner, who has worked on more mission-critical problems than most everyone here.

pjmlp · on May 15, 2020

Yes, and not all of them were turned into gold. I don't have any hopes on Swift for Tensorflow.

not2b · on May 14, 2020

It would suffice for NVIDIA to open-source enough specifications and perhaps some subset of core software to enable others to build high quality open source (or even proprietary) software that targets NVIDIA's architecture. They can't hire every programmer in the world; if other programmers can build high-performance software that takes advantage of their platform, that increases the value of their hardware.

Your comparison to Intel isn't valid: most software that runs on Intel processors isn't built with icc, and customers have a choice: they can use icc, gcc, clang, or a number of other compilers. The NVIDIA world isn't equivalent.

pjmlp · on May 14, 2020

Anyone is free to target PTX and do their own compiler on top.

In fact, given that it is there since version 3, there are compilers available for almost all major programing languages, including managed ones.

While OpenCL is a C world, and almost no one cares about the C++ extensions and even less vendors care about SPIR-V.

Also the community doesn't seem to be bothered that for a long time, the only SYCL implementation was a commercial one from CodePlay, trying to extend their compilers outside the console market.

Reelin · on May 14, 2020

> the community doesn't seem to be bothered that for a long time, the only SYCL implementation was a commercial one

Bothered has nothing to do with it. Implementing low level toolchains generally seems to require both a gargantuan effort and an incredible depth of knowledge. If it didn't, I think tooling and languages in general would be significantly better across the board.

What am I supposed to do, implement a SYCL compiler on my own? Forget it - I'll just keep writing GLSL compute shaders or OpenCL kernels until someone with lots of resources is able to foot the initial bill for a fully functional and open source implementation.

pjmlp · on May 15, 2020

Which is why CUDA won, most researchers can't be bothered to keep writing C based shaders with printf debugging.

14113 · on May 15, 2020

This is wrong - triSYCL is roughly the same age as ComputeCpp, and hipSYCL is only slightly younger. There has been a lot of academic interest in SYCL, but as with any new technology (especially niche technologies) it's always going to take time to get people on board.

Also, from a quick look at your profile, you seem to have quite a lot of comments criticizing or commenting on CodePlay. Do you have some sort of relationship or animosity with them?

pjmlp · on May 15, 2020

I wish all the luck to CodePlay, the more success the better for them.

They are well appreciated among game developers, given their background.

My problem is how Khronos happens to sell their APIs, and let everyone alone to create their own patched SDKs and then act surprised that commercial APIs end up winning the hearts of the majority.

The situation has hardly changed since I did my thesis with OpenGL in late 90's, porting a particles visualization engine from NeXTSTEP to Windows.

Nothing that compares with CUDA, Metal, DirectX, LibGNMX, NVN tooling.

Hence my reference to CodePlay, as for very long time their SDK was the only productive way to use SYCL.

Khronos likes to oversell the eco-system, and usually the issues and disparities across OEMs tend to be "forgotten" on their marketing materials.

fluffything · on May 14, 2020

Rust has a PTX backend.

taurath · on May 14, 2020

This has literally been a back and forth argument since a 100 point post on slashdot was a groundbreaking event. I don't see it changing any time soon - honestly if anything on tech forums this argument frequently overshadows just how well NVIDIA is doing.

pjmlp · on May 14, 2020

It is just like game forums as well.

The culture here and on those forums couldn't be further apart.

w0utert · on May 15, 2020

>> NVIDIA's main competitive advantage over AMD and Intel is its software stack. AMD could release a 2x powerful GPGPU tomorrow for half the price and most current NVIDIA users wouldn't care because what good is that if you can't program it?

I always wonder why it is so hard for AMD to develop a true competitor to CUDA, but for AMD hardware? Not try to solve GPGPU programming through open standards like OpenCL, just copy the concept of CUDA wholesale. They could still build it on top of LLVM etc and release the whole thing as open-source, but have the freedom to not have to deal with design-by-committee frameworks like OpenCL, so they can ensure focus on GPU programming and nothing else, and only on those platforms where the majority of the demand is. There is not much wrong with OpenCL, it's just not nearly as good/capable/easy-to-use as CUDA if all you are interested in is GPGPU programming.

AMD is a big company with a lot of revenue, especially recently, so why would it be so hard to have a team working full-time on creating a direct CUDA knock-off ASAP?

QuixoticQuibit · on May 15, 2020

Two thoughts that come to mind:

1. AMD has struggled in the past and even today on being profitable with their GPUs. Makes it difficult to entice an army of knowledgeable devs without consistent cash flow. Granted, the tide is turning with their profitable CPU business and equity has shot up.

2. More importantly I think that, being the underdog, AMD has to have a cheaper, open solution to compete. Why would a customer choose to go with AMD’s nascent and proprietary stack over Nvidia’s well established and nearly ubiquitous proprietary stack?

To be clear, I don’t think the problems are insurmountable. AMD won a couple HPC deals recently which should afford them the opportunity to build up their software and invest in a competitive hardware solution.

shaklee3 · on May 15, 2020

To be fair, Nvidia has open sourced some key libraries lately. See cutlass and cufftdx.

lightcatcher · on May 14, 2020

> Intel keeps getting it right with numerical libraries. They're open. They work well. They work on AMD.

What Intel numerical libraries are you thinking of? When I think of Intel numerical libraries, the first that comes to mind is MKL. MKL is neither open-source nor does it work well on AMD without some fragile hacks [0].

[0] https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AM...

hyperbovine · on May 14, 2020

Well, OP didn't say MKL works well on AMD. But you can at least run it on a non-Intel CPU. Compare CUDA.

fluffything · on May 14, 2020

The nvidia pgi compiler compiles CUDA to multi-core x86-64. There are other third-party compilers for CUDA->x86-64 (one LLVM-based one from Intel).

There is a "library replacement" for CUDA from AMD called HIP, that you can use to map CUDA programs to ROCm. But... it doesn't work very well.

NVIDIA also open-sourced CUDA support for Clang and LLVM. So anybody can extend clang to map CUDA to any hardware supported by LLVM, including SPIRV. The only company that would benefit from doing this would be AMD, but AMD doesn't have many LLVM contributors.

Intel drives clang and LLVM development for x86_64, paying a lot of people to work on that.

xigency · on May 14, 2020

It sounds like people want nvidia to write drivers for AMD.

This criticism makes even less sense when any bystander could implement CUDA suppport on AMD by connecting open source software.

Reelin · on May 14, 2020

> any bystander

You aren't seriously implying than any bystander is capable of extending LLVM to map CUDA to SPIR-V? What percentage of present day gainfully employed software engineers do you suppose even has the background knowledge? How many hours do you suppose the work would require?

fluffything · on May 14, 2020

If LLVM has a SPIRV backend, probably very little. For a proof of concept, a bachelor CS thesis would probably do.

Clang already has a CUDA parser, and all the code to lower CUDA specific constructs to LLVM-IR, some of which are specific for the PTX backend. If you try to compile CUDA code for a different target, like SPIRV, you'll probably get some errors saying that some of the LLVM-IR instructions generated by clang are not available in that backend, and you'll need to generate the proper SPIRV calls in clang instead.

Its probably a great beginner task to get started with clang and LLVM. You don't need to worry about the C++ frontend side of things because that's already done, and can focus on understanding the LLVM-IR and how to emit it from clang when you already have a proper AST.

Jhsto · on May 15, 2020

FWIW, there already exists LLVM to SPIR-V compiler: https://github.com/KhronosGroup/SPIRV-LLVM-Translator

Alas, this supports SPIR-V to 1.1.

xigency · on May 21, 2020

Late response I know, but I would say anyone who needs that feature could learn to do it, at least if they are on Hacker News. Maybe bystander isn't the most accurate term, but certainly anyone with criticism could take the gauntlet.

LLVM is very well documented and so are these standards. The open source community is also huge and full of talented contributors and more are always welcome to join. I think there's a reason why Linux and GitHub exist.

So in short, if it's a question of motivation and it's something you need, then become motivated to make it happen. That's more likely to happen then convincing a company to invest in supporting a competitor.

Klinky · on May 14, 2020

CUDA appears to have come out well before even OpenCL. I don't see why there would be expectation that nVidia would design their framework to work on a competitors product.

andromeduck · on May 14, 2020

Cuda was also a response to ATI's own efforts at a proprietary effort which they eventually have up on.

TomVDB · on May 14, 2020

ATI came out with CTM, which was just an assembler. CUDA was released a month or so after that. It was a full C compiler and already had a pretty large set of examples and library functions.

I downloaded CUDA about the day it was released, and used it for real some months later when I bought a 8600 GT GPU.

To call CUDA a response to CTM is too much praise for Nvidia, because it suggests that their response included cobbling a compiler and SDK in just a month. :-)

gnufx · on May 14, 2020

Not on ARM, or POWER, you can't. Why you'd want to run it on AMD, I don't understand. I don't know what fraction of peak BLIS and OpenBLAS get, but it will be high.

fluffything · on May 14, 2020

> The major breakpoint would be when libraries like Numpy do it natively

That already happened [0]. NVIDIA has a 1:1 replacement for Numpy called CuPy that does this and is what powers their RAPIDS framework (which is a 1:1 replacement for Pandas that runs on GPUs).

Some people were complaining in [0] about CuPy reproducing numpy's bugs..

[0]: https://news.ycombinator.com/item?id=22830201

fxtentacle · on May 14, 2020

Pretty much everyone these days uses a library for driving the GPU calculations. And they tend to either support multiple hardware targets directly (TensorFlow) or have API-compatible replacements (CuPy/NumPy).

So the lock-in risk here is that you might have to run your stuff on CPU if future NVIDIA GPUs are too overpriced.

I mean they are super expensive. But there's nothing that comes close to their cuBLAS library in terms of performance. So unless AMD ponys up and hires GPU algorithm engineers, NVIDIA will win simply due to their superior driver software.

I once had to optimize a CPU matrix multiplication algorithm. 10 days of work for a 2x speedup. Now imagine doing that for every one of the thousands of functions in the Blas library...

andromeduck · on May 14, 2020

Yeah I think most people don't quite appreciate the difficulty and cost of optimizing for hardware and continually maintaining that through hardware cycles. In keeping things closed source Nvidia products have both the advantages of being easier to on-board due to simpler abstraction, and faster technical progress because there is less pushback from myriad parties when big inconvenient changes might need to happen at lower levels for hardware performance reasons kind of like if instead of x86 we instead settled on LLVM.

fxtentacle · on May 14, 2020

That is a very good metaphor :)

Actually, I wonder why we went with un-compilable Java bytecode and JIT instead of advancing projects like gcj.

gnufx · on May 14, 2020

I'm not surprised expecting to beat implementations of the basic Goto strategy for BLAS didn't turn out well. BLIS only needs a single, pared-down GEMM kernel for level3, and maybe one for TRSM. (It doesn't currently have GPU support, but I think there was an implementation mentioned in an old paper.)

paulmd · on May 14, 2020

NVIDIA has no ethical or moral responsibility to give their competitors the benefit of software they have paid to develop in-house. It is probably a safe bet that you yourself do not develop your projects under the Affero GPL, and so on some level you agree with this.

What you see as "ecosystem lock-in" is properly viewed as software that you pay a premium for as part of your purchase price, above and beyond the pricing of the competitor's hardware. NVIDIA costs more than AMD because they have to employ people to write all that software, and you are "buying" that software when you purchase NVIDIA's product.

Analogously - Amiga has no moral responsibility to let you run AmigaOS on anything except their hardware. This sort of "hardware exclusivity" used to be very common and widely accepted. Today, Apple has no moral responsibility to let you run OS X on anything except their hardware (the existence of underground hackintoshing is irrelevant here). The software is part of what you are buying when you buy the product.

wegs · on May 15, 2020

It's not a safe bet. I've build project under AGPL, and made plenty of money doing it. There are places where open is good business, and there are places where proprietary is good business, and there's everything in between. AGPL was nice since I could be open, which had huge market advantage, but release code which my competitors would /never/ take advantage of. It had, quite literally, zero downsides, and a lot of upsides.

There are projects where I do 100% proprietary too, and a mix. It's a business decision. It's not as stupid as proprietary=profit and open=charity. It's a business calculation in every case.

pjmlp · on May 14, 2020

Given that I hardly saw any clone vendors other than AMD, I really doubt that they had any influence on Intel's market share.

What worked out was IBM not being able to prevent PC clones, but given the wide adoption of laptops, tablets and phones that hardly matters nowadays.

wegs · on May 14, 2020

Intel was forced to license to AMD for government contracts. There's a super-complex story there I won't get into.

There were a few clone vendors aside from AMD. None were ever a serious threat, and AMD itself didn't become more than a bottom-feeder until after maybe 15 years. But their existence did drive a lot of adoption.

And yes, I did oversimplify. MS-DOS, IBM not being able to prevent clones, and so on, all really played together here as part of the same story.

ethbro · on May 14, 2020

That's why Intel realized that fab technology was the true differentiator.

The only way to outcompete in a sea of clones is to secure exclusive access to a valuable resource they can't.

Intel with fabs. Dell with lean supply chains. The surviving hard drive and memory companies with scale.

I think IBM and Sun show what happens when you try to fight a stand-up brawl in a commodity space.

xnyan · on May 14, 2020

>That's why Intel realized that fab technology was the true differentiator.

But now the situation is completely reversed. Intel has faced all kinds of problems, costs, and delays due ultimately to the fact that they made a bad choice on their chip architecture but were forced to make it work because they invested so much in the fab.

What TSMC is fabbing for nvidia is working out really well, and if it was not nvidia could walk away without being stuck with billions of dollars of fab facilities they have to own forever.

edit: reversed is the wrong choice of words. It IS all about the fab, but Intel could not/did not accept that maybe someone else had the key differentiator now.

andromeduck · on May 14, 2020

I think it's the other way around. The architecture was being limited by their fabs ability to yeild large chips and in the absence of any CPU perf pressure from AMD the natural push would lean more towards increacong graphics performance in order to push more pixels. As in I think Intel probably had the same yeild issues as everyone else ~10-32 nm but only Intel had the high margins small chip volume to make it profitable to ramp until Apple and TSMC happened.

jdsully · on May 14, 2020

The architecture is definitely far ahead of anyone else. When you look at Intel chips still being competitive despite manufacturing being a generation behind and with 1/6th the cache per core.

I'm an AMD shareholder and my biggest fear is Intel figuring out their manufacturing.

muxator · on May 14, 2020

Maybe not recently, but in the years that cemented Intel dominance, there were many clones on the market.

pjmlp · on May 15, 2020

Yeah, but never in an amount that was actually meaningful, only by a couple of non branded PC OEMs.

ElFitz · on May 14, 2020

I think the parent only means that if Intel somehow shut down, decided to radically pivot or to close everything, because it is somewhat open, you still had alternatives and neither your code, your product, or your company would face insurmontable hardship or die because of it.

ma2rten · on May 14, 2020

Jax is an implementation of numpy on GPU.

bigdict · on May 14, 2020

Have you heard about CuPy?

wegs · on May 14, 2020

Yes. I won't bet my business on a one-vendor solution with a medium-sized community which might disappear at some point.

If CuPy supported NVidia and AMD, and was folded into Numpy, I'd buy the biggest, beefiest GPU I could find overnight.

fluffything · on May 14, 2020

What technology would you bet your business on then?

Today, you can write numpy code, and that runs on pretty much all CPUs from all vendors, with different levels of quality.

A one line change allows you to run all numpy code you write on nvidia GPUs, which at least today, are probably the only GPUs you want to buy anyways.

In practice, you would probably be also running your whole software stack on CPUs, at least for debugging purposes. So if you change your mind about using nvidia hardware at some point, you can just revert that one line change and go back to exclusively targeting CPUs. Or who knows, maybe some other GPU vendors might provide their numpy implementation by then, and you can just go from CuPy to ROCmPy or similar.

Either way, if you are building a numpy stack today, I don't see what you lose today from using CuPy when running your products on hardware for which that's available.

wegs · on May 15, 2020

shrug I'll bet my business on waiting an extra 15 minutes for analytics code to run.

Seriously. There's little most businesses really needs that I couldn't do on a nice 486 running at 33MHz. Now, if a $5000 workstations gives even 5% improvement to employee productivity, that's an obvious business decision. That doesn't mean it's necessary for a business to work. So dropping $1000 on an NVidia graphics card, if things ran faster and there were no additional cost, would be a no-brainer.

There are additional costs, though.

And no, you can't just go back from faster to slower. Try running Ubuntu 20.04 on the 486 -- it won't go. Over time, code fills up resources available. If I could take a 2x performance hit, it'd be fine. But GPUs are orders-of-magnitude faster.

fluffything · on May 15, 2020

Please, show us how to train Alexa or BERT on a 486. That'll definitely win you the Turing and Gordon Bell prices, and probably the Peace Nobel price for all those power savings!

wegs · on May 15, 2020

Please show me a business (aside from Amazon, obviously) who needs Alexa.

Most businesses need a word processor, a spreadsheet, and some kind of database for managing employees and inventory. A 486 does that just fine.

Most businesses derive additional value from having more, but that's always an ROI calculation. ROI has two pieces: return, and investment. Basic business analytics (regressions, hard-coded rules, and similar) have high return on low investment. Successively complex models typically have exponentially-growing complexity in return for diminishing returns. At some point, there's a breakpoint, but that breakpoint varies for each business.

If the goal is to limit GPGPU to businesses whose core value-add is ML (the ones building things like Alexa), NVidia has done an amazing job. If the goal is to have GPGPU as common as x86, NVidia has failed.

fluffything · on May 18, 2020

> Please show me a business (aside from Amazon, obviously) who needs Alexa.

I'll bite.

Have you ever been getting a haircut, and the hair dresser had to stop to pick up the phone to make an appointment?

Have you ever go to actually pick up a pizza at a small pizzeria and noticed that from 4 employees, 3 are making pizzas, and one is 99% of their time on the phone?

Every single business that you've ever used in your life would be better off with an Alexa that can handle the 99% most common user interactions.

In fact, even small pizzerias and hair salons nowadays are using third-party online booking systems with chat bots. Larger companies are able to turn a 200 people call center into a 20 man operation by just using an Alexa to at least identify customers and resolve the most common questions.

bigdict · on May 15, 2020

CuPy has experimental support for Rocm.

wegs · on May 15, 2020

Awesome! I did not know that.

pjmlp · on May 14, 2020

The large majority of researchers and business getting into NVidia products doesn't seem to find it that relevant, rather what tools, GPU programming languages and hardware they are able to put their hands on.

wegs · on May 14, 2020

It's irrelevant to researchers. Research operates on rapid cycles: prototype, publish, move on.

It does impact businesses. It doesn't prevent adoption for e.g. deep learning, but I haven't seen e.g. GPU-based databases reach broad adoption, or many other places where MIMD/SIMD would reduce costs or improve performance. Using classical hardware is clearly cheaper than the business risk and engineering time of relying on a proprietary, closed hardware solution.

I'm at the edge, where my workloads don't require GPU, but could benefit from it. This sort of thing factors into decision-making. I dabble in GPU, but never beyond prototypes, for those reasons.

I think this is one of the reasons why these devices haven't reached wide-spread marketshare. Most computers sold have an integrated chipset. People buying NVidia GPUs are researchers (who don't care), deep learning applications (who don't have a choice), and gamers. There have been predictions for two decades that GPU-style SIMD and MIMD architectures would displace the centrality of the GPU.

Technically, it makes sense. If I type a list comprehension in Python, it would run at higher speed and lower power on a SIMD or MIMD platform.

I think the reason that hasn't happened is because x86 and x64 are open and widely-supported. NVidia is a walled garden, and is only practical for markets NVidia explicitly targets.

This story plays out over and over. Business people push for closed. Eventually, open comes along, and wipes it out. Sometimes, as with x86 or the iPhone, that leads to increased profits. Sometimes, as with Wikipedia, that kills businesses.

pjmlp · on May 14, 2020

NVidia is not to blame if the competition is stuck using C, printf debugging for computing shaders, cannot make their minds about which bytecode to support for heterogenous GPGPU programming.

The situation is so bad that OpenCL 1.2 got promoted to OpenCL 3.0 and SYSCL is now backend independent, while hip only works on Linux.

As for Python, guess who is on the forefront of GPU Programming with Python,

https://www.nvidia.cn/gtc/session-catalog/?search=python

41 results, including CUDA based JIT improvements.

Meanwhile, at IWOCL & SYCLcon 2020,

https://www.iwocl.org/iwocl-2020/conference-program

2 sessions, where it is mentioned that PyFR might need OpenCL 3.0 extensions going forward.

So if competition is not able to provide, most just get to buy NVidia.

adev_ · on May 14, 2020

> NVidia is not to blame if the competition is stuck using C, printf debugging for computing shaders, cannot make their minds about which bytecode to support for heterogenous GPGPU programming.

I don't bite this argument.

Nvidia made close to no effort to support OpenCL and promoted their own technology CUDA. Even in 2020, OpenCL support for Nvidia hardware is close to nonexistent.

When the main actor of the market does not support a technology, why the hell would you use it or even develop its ecosystem (SYSCL)

> The situation is so bad that OpenCL 1.2 got promoted to OpenCL 3.0 and SYSCL is now backend independent.

OpenCL 3.0, presented by an Nvidia official(https://khr.io/ocl3slidedeck), head of the working group on it. A company that made close to 0 effort to support OpenCL 2.0 revert the spec to 1.2. Astonishing right ?

pjmlp · on May 14, 2020

OpenCL isn't also supported on Android, where Google pushes Renderscript instead, their own C99 dialect, yet I don't see any uprising against Google.

If the 139 member companies (taking NVidia out) listed here aren't able to provide the same quality in hardware, programming language and eco-system improvements against those from NVidia, and vote for a NVidia employee as chairman, then they deserve what they get.

https://www.khronos.org/members/list

It is so easy to find a villain instead of acknowledging failure.

fulafel · on May 15, 2020

OpenCL used to be supported on Android (But not required on by Google). Currently, Vulkan _is_ required by Google[1] and is probably the future path to GPU compute.

[1] https://www.androidpolice.com/2019/05/07/vulkan-1-1-will-be-...

pjmlp · on May 15, 2020

OpenCL is only supported via hacks to install shared libraries into one own's device.

Vulkan even if optional until Android 10, it is supported by the SDK since version 7, which is something that OpenCL never had.

No serious Android developer would make their life even harder than it already is with the official APIs, by making use of an API that requires device owners to manually install libraries via ADB.

fulafel · on May 15, 2020

Hmm, is the OpenCL situation really that fringe on android? Eg this OpenCL info app[1] description says "Even though OpenCL™ isn't part of the Android platform, it's available on many recent devices. On Android it's usually used as a back-end for other frameworks like Renderscript. Some manufacturers are providing SDKs for developers to use OpenCL™ on Android. "

I addition to the mentioned PowerVR and Intel platforms there seems to be Android OpenCL also for Mali and Adreno GPUs.

[1] https://apkpure.com/opencl%E2%84%A2-info/com.xh.openclinfo

pjmlp · on May 16, 2020

Yes it is, it is not exposed via the SDK, regardless how Renderscript ends up being compiled to machine code.

You won't find any reference to OpenCL here,

https://source.android.com/

https://developer.android.com/

So while those SDKs do exist, they aren't for application developers, rather for OEMs themselves.

You as application developer have zero control about what GPUs the customers might be using, there is no way to control it on Android manifests, only to specify what APIs are expected, which again, don't have OpenCL as part of the list.

https://developer.android.com/guide/topics/manifest/manifest...

So if you as application developer want to use said SDKs, it is only for your own device, most likely rooted, don't expect to sell applications on the store using OpenCL.

fulafel · on May 16, 2020

Yep, you can't of course rely on OpenCL support, you have to provide a fallback. But are there any issues with including the SDK supplied OpenCL ICDs with the APK, for use with validated GPUs?

pjmlp · on May 16, 2020

Yes, that is not how Android works.

You are not allowed to ship drivers like that, and since version 7, there is kernel validation about stable NDK APIs.

So yeah just because there is an OpenCL SDK for Mali, doesn't mean that a random Android device with Mali will have the drivers or kernel support in place, because that isn't something that Android requires.

Google has collaborated with Adobe in porting their OpenCL shaders to Vulkan.

If they actually cared, they would have made OpenCL available instead.

fulafel · on May 17, 2020

I see, thanks for the explanation. Oh well, let's see where things go with Vulkan compute.

nl · on May 15, 2020

> I don't bite [buy] this argument.

I don't think NVidia's half-hearted support for OpenCL has anything to do with the argument at all.

It's a really good point that other tool vendors haven't stepped up and provided modern development tools for OpenCL (or any of AMD's various attempts), while NVidia has great dev tools.

wegs · on May 14, 2020

It's not about blame. It's about getting to an ecosystem where GPGPU is used for things beyond deep learning, bitcoin mining, video encoding, and similar niche applications to one where I can fluidly use MIMD to speed up my JavaScript and Python with first-class language constructs to support that.

If that happens:

1) We'll get back on some kind of curve where computer performance starts increasing again.

2) The GPU will become more important than the CPU, and the market will explode.

Until that happens, the GPU market will be for gamers, video editors, and machine learning nerds.

I don't much care who does that, or why it hasn't happened.

pjmlp · on May 15, 2020

It is easy to talk about the proprietary practices done by NVidia, yet none of the other GPGU device makers that are on Khronos weren't able to offer a better experience.

So Khronos has 140 members, about 10 of them producing hardware, and they can't provide a proper developer experience, with software that looks like EE toolchains of the 90's.

The market has already exploded, and CUDA has won.

wegs · on May 15, 2020

You're missing the point, and taking everything as an attack on NVidia. It's not an attack on NVidia, or have anything to do with Nvidia versus Khronos. It's clear you've got enough baggage there that I won't go there (not like I was trying to go there in the first place).

But the market hasn't exploded. Most computers have built-in Intel graphics, and most apps can't make use of GPU. NVidia won the battle with AMD, but lost the battle with Intel. GPUs are still for gamers, deep learning applications, bitcoin miners, video editors, and a few other niche applications.

Given that CPUs aren't increasing in speed, and GPGPU is failing to make in-roads, for most workloads, computers are only marginally faster than they were a decade or two ago. If GPGPU made inroads into general computing, we'd still be on a Moore's Law curve, but that hasn't happened.

That's the problem.

pjmlp · on May 17, 2020

Not at all, my complain is the poor service that Khronos keeps doing pushing their half backed supported APIs, expecting that OEMs pick up the tooling part.

What ends up happening is that OEMs, coming traditionally from the EE and embedded mindset almost never provide any tooling worthwhile using.

This is why platform APIs always end up winning the hearts of most developers that aren't into FOSS mindset.

Interesting that you mention Intel, they would rather have you using Open API or ispc, instead of pure Khronos APIs.

And Intel keeps failing at their GPU story anyway.

fluffything · on May 14, 2020

> but I haven't seen e.g. GPU-based databases reach broad adoption

NVIDIA announced today that Apache Spark 3.0 is built on RAPIDS and showed some benchmarks claiming same perf at 1/5 the cost.

Not your classical database application though.

threeseed · on May 15, 2020

No they have built a GPU accelerated XGBoost library.

Spark 3.0 is in preview stage already and there is no mention of RAPIDS anywhere in the code.

fulafel · on May 14, 2020

There's definitely a chicken-and-egg problem of the current user and developer base of GPGPU apps being a very tolerant bunch. "It's just a flesh wound" they say about a lot of things that are prohibitive to normal app developers. Maybe it's the natural order of things, or maybe the the "incompatible proprietary C++ dialects with different kinds of crashy drivers on each OS" approach will be suffiently unpalatable to some future generation of programmers.

aseipp · on May 14, 2020

GPU based databases haven't reached broad adoption because sending things over the PCIe link is a huge waste of time if you can avoid it. Working around this with custom design like NVLink/NVSwitch do is ridiculously expensive (and why a DGX costs a gajillion dollars), and there is simply not enough volume to subsidize it. They are largely analytics focused, because the parallel hardware can obviously map onto primitives like sequential scan and filters relatively easily. Futhermore, data sizes are not small. Thus the architectures tend to emphasize things like in-memory (VRAM) workloads that get scaled horizontally via RDMA (or RoCE, whatever people are doing these days), which is expensive and limited. Major businesses (i.e. people with money, who nvidia are targeting) already pay for proprietary databases, regularly, every day. That's not the barrier. All of the actual true secret sauce is in the hardware design, and you can't replicate that. You're always at Nvidia's mercy to design solutions to their customers needs. (And frankly, they've done that pretty well, I think.)

Sure, you can pay almost $10,000 per Tesla V100 (which aren't going to become magically cheaper, all of a sudden), and buy 8x of them. That's a 256GB working set, for the price of like, $70k USD. It might make sense for some things. For everyone else? Pay $30,000 for a single server, run something like ClickHouse, and you'll have a better overall TCO for a vast majority of workloads. It'll saturate every NVMe drive and all the RAM (terabytes) you can give it, and will scale out too. It's got nothing to do with openness and everything to do with system architecture. You can replicate all of this with whatever AMD has and it won't make a single bit of difference in the market.

I don't like the fact Nvidia keeps their software closed either (and in fact it was a motivating reason for replacing my old GTX in my headless server with a Radeon Pro card recently), but the problems you're talking about are not ones of openness.

> If I type a list comprehension in Python, it would run at higher speed and lower power on a SIMD or MIMD platform.

I think you vastly underestimate the complexity of these platforms and how to extract performance from them, if you think it's as simple as your list comprehension going faster now and you hang up your coat and you're done. Sure, when you're experimenting, that 5x raw wall clock time improvement is nice, and you don't think about whether or not you could have done it with comparable hardware under a different cost profile (5x faster is good, but 5x longer wall clock than the GPU but 15x lower power is a winner). But when you're paying millions of dollars for these systems, it's not a matter of "how to make this thing faster", it's "how do I utilize the resources I have, so 85% of this $300,000 machine isn't sitting idle". This thinking is what drives the design of the overall system, and that's much more complicated.

wegs · on May 14, 2020

I don't underestimate the complexity. But I do claim that the complexity can and should be hidden behind programming language constructs. I've worked both on the design of MIMD hardware, back when I was a graduate student, and on programming languages. These aren't easy problems, but they are solvable.

The reason for openness isn't abstract. I don't think NVidia will solve these problems alone. NVidia can make really good tools for a few specific domains, but generalizing to how we apply this to JavaScript, databases, or Python interpreters requires an open community approach. It requires a lot of people experimenting and dabbling.

It's kind of like Nokia and friends thinking they could solve the problem of building phone apps alone. When Apple launched the iPhone, and there was a community pushing things forward, we were in a whole new world of progress.

I would argue NVidia underestimates both the potential and the complexity if they think they can go it (relatively) alone, come up with the right programming constructs, and provide the right set of tools for programmers to consume.

pjmlp · on May 14, 2020

Except there is a community, a CUDA community and from GTC sessions, a very big one.

Ironically this walled garden as you put it, has produced more programming languages and tooling for GPGPU programming than the open conglomerate design by committee from Khronos has been able to achieve together against a single company, which kept pushing their C mantra until it was too late.

wegs · on May 14, 2020

Yes. To have openness work, you need to execute well too.

aseipp · on May 14, 2020

I'm not making a claim about the necessity of experimentation. I spent years (and working a paid job) doing programming language work, and also design hardware these days in my spare time, so I'm not against that. I'm specifically addressing the claim that "GPU databases haven't taken off because of lack of open source CUDA" or whatnot. Database tech is one of the most R&D heavy engineering subfields, almost all major innovations come from it. The points I made up are not coming from thin air, they're the result of people (engineers) doing a lot of experimentation and coming to similar conclusions for many years. You don't need open source designs to prove this, by the way, you only need to do basic napkin math about the characteristics of the system, and how data moves around, to come to similar conclusions. You need a correct (and I hate this word) synergy of hardware, software, and programming model to do it. A programming language, a new model, does not change the theoretical bandwidth of PCIe 3.0, or the fact you have a memory hierarchy to optimize for best performance. Just having one and none of the others, or having lopsided characteristics, isn't sufficient, and innovations across the stack are one of the major things people are reaching for, in order to differentiate themselves.

That said, I agree and would love to see less crappy programming models here. As a PL geek, I have numerous reasons why I think that's necessary. It really needs to be easier to compose sets of small languages, and design them -- one for designing streaming systems, one for latency sensitive ones. They need to model the memory hierarchy available to us (a huge thing most do not do, and vital to system performance.) I'd love this. But it doesn't undermine anything I said earlier about why things are the way they are, today. No amount of fancy programming languages is going to change the fact a $10,000 Supermicro server is more cost effective than $70,000 worth of V100s for 90% of OLAP workloads you'd want a database for. Engineers design accordingly.

There is also the problem of needing huge amounts of capital, where most of this work can only be done by exceedingly well funded groups with deep ties to hardware divisions in question. The future of hardware innovation comes from billion dollar companies, because only they can sustain it, not plucky engineers. Sure, for us, CUDA being open source would be awesome. But you don't really need open source drivers when you're working directly with the vendor on your requirements and you pay them millions for support and you just use Linux for everything. You just let them solve it and move on. The engineering world is designed this way (both by engineers, and by capitalists), because it is how we make money from it in a capitalist society!

> I would argue NVidia underestimates both the potential and the complexity if they think they can go it (relatively) alone, come up with the right programming constructs, and provide the right set of tools for programmers to consume.

Nope. Nvidia understands that they alone may not hit a global optimum or whatever in all these fields. I suspect given that they have entire divisions of highly skilled engineers dedicated to programming tools -- they understand it better than either of us. But what they also understand is that their software stack is a differentiator for them, because it actually works (the competitors don't) and it makes them money to keep it that way. You're confusing a technical problem with one of politics and vision -- a categorical mistake about their priorities and where they lie. I don't want to sound crass, but people saying "I would argue that I, the sole, lone gun engineer, understand their business and future and everything way better than they do" is typical of engineers, and it is almost always a categorical mistake to think so.

Nvidia fully understands that maybe some nebulous benefit might come to them by open sourcing things, maybe years down the line. They understand plucky researchers can do amazing things, sometimes. But they understand much better that keeping it closed makes them money and keeps them distinct from their competitors in the short term. If you think this is a contradiction, or seemingly short sighted: don't worry, because you are correct, it is. What is more "surprising" is recognizing that all of capitalist society is built on these sorts of contradictions. I'm afraid we're all going to have to get used to waiting for FOSS nvidia drivers/CUDA.

EDIT: I'll also say that if this changes from their "major open source announcement" they were going to do at GTC, I'll eat my hat. I'm not expecting much from Nvidia in terms of open source, but I'd happily be proven wrong. But broadly I think my general point stands, which is that thinking about it from the POV of "open source drivers are the limitation" isn't really the right way to think about it.

lmeyerov · on May 15, 2020

Yes and no.

I wouldn't do mid/low storage tiers in a GPU b/c indeed, drinking through a straw. When it's all I/O, even the insane GPU bandwidth still assumes enough compute to go with it. A couple of GPU vendors pitch themselves as GPU DBs, and that's tough positioning when the assumption is all the data lives in the DB. From what I can tell, that only works for < TB in practice, and ideally < 10GB with few concurrent users.

But if you're doing a lot of Spark/Impala/Druid style compute, where storage is probably separate anyways (parquets in HDFS/S3 -> ...) and there is increasingly math to go along with it (analytics, ML, neural nets, data viz, ...), different story. Now that stuff like regex is pretty easy with RAPIDS, instead of doing pandas -> spark or pandas -> rapids, I try to start with cudf to beginwith. (But definitely still not quite there.) We partner a bunch with BlazingSQL here, and they've always been chasing the out-of-core story here. A couple of the lesser-known GPU 'DB's do as well, such as FastData focusing explicitly on replacing spark/flint wrt both batch & streaming.

A few trends you may want to reexamine the #s on:

-- CPU perf/watt (~= perf/$) vs GPU perf/watt (~= perf/$), especially in cloud over last 10 years: GPU is steadily dropping while CPU isn't

-- CPU-era Spark and friends are increasingly bound by network I/O, while GPU boxes go for thicker. You can also do Spark on a thicker box, but at that point, might as well go shared GPU and keep it there (RAPIDS)

-- Nvidia & cloud providers have been pushing on direct-to-gpu and direct gpu<>gpu, including at commodity levels. Mellanox used to be a problem there, and now they control them. My guess is the bigger challenge in ~2yr will be rewriting RAPIDS for streaming & serverless & more transparent multi-GPU; the HW is hard but seems more predictable and much better staffed.

GPU isn't an end-all, but when a lot of CPU data libs are going data parallel / columnar, and Nvidia is improving more than Intel for perf/watt (= perf/$), the choice between multicore x SIMD vs GPU keeps tilting in Nvidia's favor.

wegs · on May 14, 2020

> There is also the problem of needing huge amounts of capital, where most of this work can only be done by exceedingly well funded groups with deep ties to hardware divisions in question. The future of hardware innovation comes from billion dollar companies, because only they can sustain it, not plucky engineers. Sure, for us, CUDA being open source would be awesome. But you don't really need open source drivers when you're working directly with the vendor on your requirements and you pay them millions for support and you just use Linux for everything.

I think the exact same argument could be made for mainframes and microcomputers before we standardized on x86. RISC architectures were cheaper and faster in the eighties and nineties than CISC, but x86 cleaned up because it was standard and had an ecosystem. NVidia is limiting its ecosystem to everyone who needs HPC, where the ecosystem should be everyone (no qualifier). All computers could benefit from a massively parallel MIMD co-processor.

> But what they also understand is that their software stack is a differentiator for them, because it actually works (the competitors don't) and it makes them money to keep it that way.

And I think Symbian made the same argument before being steamrolled by iOS and Android. And I've seen the same argument made by business folks at several businesses I've worked at.

By the way, "open" doesn't mean it's not okay to keep some pieces proprietary. NVidia can keep their differentiator by keeping key algorithms proprietary, while making the architecture open, and developing a common set of cross-platform APIs to target that architecture. For example, a cell phone maker can open source most of their OS, but keep pieces like the fancy ML integrated into their photography app (and other similar pieces) proprietary.

> Nvidia fully understands that maybe some nebulous benefit might come to them by open sourcing things, maybe years down the line.

I think you hit the nail on the head here. The benefits of open feel nebulous; it's a long-tail effect and difficult to quantify. It also takes time. On the other hand, the benefits of proprietary are short-term and easy to quantify. Wrong business decisions get made all the time. Indeed, bad business decisions sometimes get made where everyone can tell it's the wrong decision -- it's just org structures are set up to make those decisions. I think this isn't me claiming to be brilliant or smarter than NVidia so much as NVidia failing in the same exact way many organizations fail, by the design of the org structures.

> They understand plucky researchers can do amazing things, sometimes.

It's actually not just about amazing things. It's about a long tail of dumb stuff too. My phone has a few apps better than Google could build. It has dozens of apps Google chose not to build. Most of the stuff I want to do isn't big enough to ever show up on NVidia's radar, but there are a lot of people like me. Symbian didn't make a piano tuner app. It's not hard to make one. I have one, though.

Of course, there are brilliant pieces too. I have some VR/AR apps on my phone which Google would need to invest a lot of capital to make.

> EDIT: I'll also say that if this changes from their "major open source announcement" they were going to do at GTC, I'll eat my hat. I'm not expecting much from Nvidia in terms of open source, but I'd happily be proven wrong. But broadly I think my general point stands, which is that thinking about it from the POV of "open source drivers are the limitation" isn't really the right way to think about it.

I'm not holding my breath for NVidia to change. But I do hope at some point, we'll see a nice, open MIMD architecture which gives me that nice 10-100x speedup for parallel workloads. I actually couldn't care less about whether that speed-up is 50x or 100x (which is where NVidia's deep R&D advantage lies). That matters for bitcoin mining or deep learning. For the long tail I'm talking about, the baseline speedup is plenty good enough. The cleverness doesn't come from pushing extra CPU cycles out, but in APIs, ecosystem building, openness, standardization, etc. That stuff is a different kind of hard.

ethbro · on May 14, 2020

Hypothetically, from an ISA perspective, why couldn't Intel and AMD extend x86-64 more fully with SIMD / MIMD instructions? (as in, way more fully than MMX / SSE / AVX)

Naive question, because I literally don't know the link between CPU instruction stream and GPGPU instruction stream.

But it seems like there would be an opportunity to seize the higher (open) ground at the ISA level, and then force Nvidia to implement its own support for that standard.

With the point of being able to run identical code across CPU / CPU-with-embedded-GPU / CPU + GPU.

Understand we're talking about mind-boggling levels of complexity here, but it feels like the CPU shops ceded the role of graphical ISA to Nvidia & Microsoft (DirectX).

slaymaker1907 · on May 14, 2020

GPUs have gone far beyond just SIMD these days. To effectively program a GPU, you need to program it like a GPU, not a CPU. In particular, while most people are aware that GPUs don't like branching at a high level, branching can actually be fine as long as each block (small group of processors in the GPU) take the same branch. Block 1 taking the branch while block 2 not taking the branch will have little impact on performance. Additionally, the memory hierarchy is completely different for GPUs with blocks sharing cache and a huge number of registers per core while having very little memory for a typical stack.

Sure, treating a GPU as a SIMD blackbox may work for many problems as a suitable abstraction, but in doing so you also overlook many of its capabilities. x86-64 can emulate many of the SIMD aspects without too much trouble, but the aspects like huge number of processors with many registers is not something that is able to be reproduced without a large number of tradeoffs.

The only way that I see it as being possible to have a true CPU/GPU hybrid that is effective would be to basically have two separate chips for the GPU and the CPU, maybe multiple chips. I think the reason why such a product has not really taken off is because at that point there really isn't a point over using it versus a separate GPU and CPU. Maybe if hardware designers figured out how to greatly improve CPU to GPU communication in such a setup over having the motherboard in-between it might be worth it. CPU to GPU communication is a bottleneck for many applications.

m463 · on May 14, 2020

learning about SIMT helped my understanding of the differences.

pjmlp · on May 14, 2020

That is what Intel tried to do with Larabee and failed spectacularly.

fluffything · on May 14, 2020

Intel actually failed at least twice at this. More recently with Larabee and the Xeon Phi saga that got us AVX-512, and maybe Intel Xe saves it.

But in the late 80s and early 90s they actually had the i860 and its variants: https://en.wikipedia.org/wiki/Intel_i860

which contained a graphics processor! That wasn't a particular success but it landed us the MMX ISA on x86.

Surprisingly enough, Intel went then to make similar mistakes with Itanium, and then with Larabee. But both things landed us features on x86.

johncalvinyoung · on May 14, 2020

I wouldn't say they failed. I'd say they gave up on it before product maturity.

ethbro · on May 14, 2020

I'd say they aimed for the wrong market (graphics processing, where they were competing against very specialized and experienced competitors) and failed to partner.

Maybe Intel ~1998 could have solo-launched a new architecture, but the only way they'd get uptake now is something in cooperation with AMD.

And maybe the AMD partnership bridge is burnt from previous shenanigans, but it seems like both AMD and Intel would have incentive in more tightly coupling graphics compatibility to CPU ISA, vs Nvidia designing their own.

That said, in that hypothetical reality, Nvidia wouldn't have been able to innovate and execute nearly as fast as they have.

As one of my Comp-E professors once quipped, "If a structural engineer ever tells you programming close to processors is easy, ask them how they'd like their job if the physical properties of lumber changed every 2 years."

lliamander · on May 14, 2020

> I'd say they aimed for the wrong market (graphics processing, where they were competing against very specialized and experienced competitors)

Also, the game they were using to validate the performance of their hardware (Doom, I think?) ended up being so different from other software in how it used the GPU that their optimizations didn't really transfer.

jabl · on May 14, 2020

They pivoted it to HPC (the Xeon Phi product line), produced a couple of generations of products, but that didn't really pan out either so they cancelled it.

I suppose some of the "DNA" lives on in AVX-512..

bitL · on May 14, 2020

> why couldn't Intel and AMD extend x86-64 more fully with SIMD / MIMD instructions

I think there is that latency vs bandwidth trade-off where CPU likes lower latency and GPU higher bandwidth, but you can't achieve the same with a single chip.

ethbro · on May 14, 2020

I guess this is fundamentally a homogeneous vs heterogeneous ISA question. I.e. is your ISA intended to operate one chip, or multiple cooperative chips / complexes?

enos_feedler · on May 14, 2020

Does it make sense for a hardware ISA to express cooperation between chips? I would think HW ISA is meant to control it's local microarchitecture. I could see a virutal ISA or compiler IR built with a multi chip view.

throwaway2048 · on May 14, 2020

GPUs are all a single chip atm.

hodgesrm · on May 14, 2020

> Technically, it makes sense. If I type a list comprehension in Python, it would run at higher speed and lower power on a SIMD or MIMD platform.

That assumes the operations, data types, and data arrangement are such that they are vectorizable. In databases at least this kind of optimization is typically done by hand because there are so many constraints. You can't just turn on a flag in the code and make it happen automatically. [1]

[1] https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

solidasparagus · on May 14, 2020

> I haven't seen e.g. GPU-based databases reach broad adoption

Because it's very questionable whether GPU-based databases are generally valuable. GPUs accelerate compute, not all of the other things that databases do and often GPUs are not cost effective.

> I dabble in GPU, but never beyond prototypes, for those reasons

And I bet if you dabbled a little further you still wouldn't use it because it isn't cost-effective outside of intensive-compute applications.

> Technically, it makes sense. If I type a list comprehension in Python, it would run at higher speed and lower power on a SIMD or MIMD platform.

This is a very questionable claim. There are tradeoffs to these things (e.g. clock-speed, startup time, etc) and your list comprehensions are probably not compute heavy enough that the tradeoffs are worth it.

wegs · on May 14, 2020

You're making a lot of lousy assumptions. As a few points of reference:

* My list comprehensions run over gigabytes of data (but sometimes 3 orders of magnitude bigger or smaller). Stream processing of big data. It's not deep learning, but it's slow and potentially deeply parallel. It would move to MIMD trivially, and SIMD with just a little bit of work.

* There are programming languages which support models almost exactly like this for data processing. Sun Labs Fortress comes to mind as an early example. This would generalize to a lot of contexts -- much smaller than you're giving credit for.

* Most of the issues, like startup times, are implementation-specific, rather than fundamental, and could be mitigated for much smaller data too. You do need to wrap your head around changes in programming paradigms to make that work. There is some overhead for latency (you'll probably never do well moving a list with 10 items to a GPU), but most of those aren't where programs are performance-bound.

* Many database operations map very well to MIMD.

* You're making deep assumptions that you're talking to an idiot. That doesn't make you look smart or right, or lead to a constructive discussion.

nl · on May 15, 2020

If this is the workload you are looking at, then you really should look at CuPy.

I know you discard this because you don't like the NVidia dependency, but it's not much different to switching a BLAS library or using Intel's numerically optimised Numpy distribution. Your code remains the same, you just change the import and get magic speed.

If you still refuse to look at it, then perhaps consider cuBLAS[1], which you can switch out for any other BLAS library (eg [2]). It's one thing that AMD actually has bothered to do and they have version available for CPUs[3] and GPUs[4].

[1] https://developer.nvidia.com/cublas

[2] https://towardsdatascience.com/is-your-numpy-optimized-for-s...

[3] https://developer.amd.com/amd-aocl/blas-library/

[4] https://rocmdocs.amd.com/en/latest/ROCm_Tools/rocblas.html

solidasparagus · on May 14, 2020

You're right about my assumptions, I apologize. Without knowing specifically what you do, I can't say if GPUs would make sense.

But I don't agree that many workloads could be moved to GPUs cost effectively. It's very hard to feed work in fast enough to keep the GPU busy enough to be cost effective given the limited amount of GPU memory you have to work with.

wegs · on May 15, 2020

Well, there's a question of whether the GPU has to be fed fast enough. My CPU is nominally rated at around 150-200 gigaflops. My GPU is rated at about 5 teraflops. That's about a 30x speed difference (and there are obviously faster GPUs out there). That's enough to move me from compute-bound to IO-bound and make things a lot faster. Once I'm IO-bound, I'll obviously see no more performance increase, but I figure I'll get a good 5-10x before I get there.

Right now, code runs anywhere from a few seconds to overnight, depending on what I'm doing. I'll also mention I'm working on many projects, so that's not overhead I incur every day, just once in a while.

Moving to GPU would move that to running anywhere from more-or-less instantly to an hour, I figure, based on similar very back-of-the-envelope benchmarks and guesstimates. That's totally worth dropping $1000 on a new GPU, if that's all it took and things worked out-of-the-box. It'd pay for itself in a few programmer hours.

On the other hand, that's totally not worth weeks of programmer / dev-ops time for switching to a proprietary tool chain. An alternative there is to wait for my computation, or to optimize my code. Both of those seem cheaper than maintaining a GPGPU workflow, where GPGPU is right now. If GPGPU came batteries-included in Ubuntu+numpy, it'd be an entirely different story.

codemk8 · on May 14, 2020

Agree. The reason GPUs are not widely adopted in certain areas is not because of open sourced or not, but because it is not cost effective. GPUs are optimized for throughput, not latency.

tyingq · on May 14, 2020

I suspect some of it is driven by trying to keep the gaming/ai and desktop/server markets from overlapping. Market segmentation. If it were more open, that would be harder.

the_pwner224 · on May 14, 2020

I wanted to do GPU PCI passthrough in a VM (run Linux host, then for gaming run a Windows VM with the GPU passed through to get good performance). Nvidia disabled this for their consumer GPUs; the Nvidia drivers in the Windows VM will block this from working. It was a purely software thing; there was no reason for this aside from nvidia wanting companies to pay more for the Quadro/etc. GPUs.

In addition to that, there's always the proprietary blob running in my kernel.

So a few months ago I bought an AMD card for my new computer.

Sebb767 · on May 14, 2020

GPU passthrough is also doable pretty easily on NVidia nowadays. See here: https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM...

/r/VFIO on Reddit is also pretty helpful.

That being said, I fully support you buying and using AMD. But no need to throw out perfectly fine hardware in case you still have NVidia lying arround.

diffeomorphism · on May 15, 2020

> GPU passthrough is also doable pretty easily on NVidia nowadays.

By actively working against Nvidia who could break it again at any time if they wanted to:

> Starting with QEMU 2.5.0 and libvirt 1.3.3, the vendor_id for the hypervisor can be spoofed, which is enough to fool the Nvidia drivers into loading anyway.

If you already have Nvidia, fine, but to me this reads as a strong reason to not buy Nvidia if you can help it.

Sebb767 · on May 16, 2020

Oh, don't get me wrong, I surely don't want to encourage you to buy Nvidia.

To be fair here, AMD also has some gripes with VFIO: Namely, the reset bug on Navi (which I personally didn't experience, but read about quite a few times) and disabled vGPU support on their smaller cards, which is, as far as I know, only a software solution and not really something that would steal their business customers either.

Still, I'm rooting for AMD, if only for the fact that they're the reason it doesn't take six CPU generations any more to have a 50% performance bump.

partingshots · on May 15, 2020

Do you have a link for their open source announcement?