I don't underestimate the complexity. But I do claim that the complexity can and...

pjmlp · on May 14, 2020

Except there is a community, a CUDA community and from GTC sessions, a very big one.

Ironically this walled garden as you put it, has produced more programming languages and tooling for GPGPU programming than the open conglomerate design by committee from Khronos has been able to achieve together against a single company, which kept pushing their C mantra until it was too late.

wegs · on May 14, 2020

Yes. To have openness work, you need to execute well too.

aseipp · on May 14, 2020

I'm not making a claim about the necessity of experimentation. I spent years (and working a paid job) doing programming language work, and also design hardware these days in my spare time, so I'm not against that. I'm specifically addressing the claim that "GPU databases haven't taken off because of lack of open source CUDA" or whatnot. Database tech is one of the most R&D heavy engineering subfields, almost all major innovations come from it. The points I made up are not coming from thin air, they're the result of people (engineers) doing a lot of experimentation and coming to similar conclusions for many years. You don't need open source designs to prove this, by the way, you only need to do basic napkin math about the characteristics of the system, and how data moves around, to come to similar conclusions. You need a correct (and I hate this word) synergy of hardware, software, and programming model to do it. A programming language, a new model, does not change the theoretical bandwidth of PCIe 3.0, or the fact you have a memory hierarchy to optimize for best performance. Just having one and none of the others, or having lopsided characteristics, isn't sufficient, and innovations across the stack are one of the major things people are reaching for, in order to differentiate themselves.

That said, I agree and would love to see less crappy programming models here. As a PL geek, I have numerous reasons why I think that's necessary. It really needs to be easier to compose sets of small languages, and design them -- one for designing streaming systems, one for latency sensitive ones. They need to model the memory hierarchy available to us (a huge thing most do not do, and vital to system performance.) I'd love this. But it doesn't undermine anything I said earlier about why things are the way they are, today. No amount of fancy programming languages is going to change the fact a $10,000 Supermicro server is more cost effective than $70,000 worth of V100s for 90% of OLAP workloads you'd want a database for. Engineers design accordingly.

There is also the problem of needing huge amounts of capital, where most of this work can only be done by exceedingly well funded groups with deep ties to hardware divisions in question. The future of hardware innovation comes from billion dollar companies, because only they can sustain it, not plucky engineers. Sure, for us, CUDA being open source would be awesome. But you don't really need open source drivers when you're working directly with the vendor on your requirements and you pay them millions for support and you just use Linux for everything. You just let them solve it and move on. The engineering world is designed this way (both by engineers, and by capitalists), because it is how we make money from it in a capitalist society!

> I would argue NVidia underestimates both the potential and the complexity if they think they can go it (relatively) alone, come up with the right programming constructs, and provide the right set of tools for programmers to consume.

Nope. Nvidia understands that they alone may not hit a global optimum or whatever in all these fields. I suspect given that they have entire divisions of highly skilled engineers dedicated to programming tools -- they understand it better than either of us. But what they also understand is that their software stack is a differentiator for them, because it actually works (the competitors don't) and it makes them money to keep it that way. You're confusing a technical problem with one of politics and vision -- a categorical mistake about their priorities and where they lie. I don't want to sound crass, but people saying "I would argue that I, the sole, lone gun engineer, understand their business and future and everything way better than they do" is typical of engineers, and it is almost always a categorical mistake to think so.

Nvidia fully understands that maybe some nebulous benefit might come to them by open sourcing things, maybe years down the line. They understand plucky researchers can do amazing things, sometimes. But they understand much better that keeping it closed makes them money and keeps them distinct from their competitors in the short term. If you think this is a contradiction, or seemingly short sighted: don't worry, because you are correct, it is. What is more "surprising" is recognizing that all of capitalist society is built on these sorts of contradictions. I'm afraid we're all going to have to get used to waiting for FOSS nvidia drivers/CUDA.

EDIT: I'll also say that if this changes from their "major open source announcement" they were going to do at GTC, I'll eat my hat. I'm not expecting much from Nvidia in terms of open source, but I'd happily be proven wrong. But broadly I think my general point stands, which is that thinking about it from the POV of "open source drivers are the limitation" isn't really the right way to think about it.

lmeyerov · on May 15, 2020

Yes and no.

I wouldn't do mid/low storage tiers in a GPU b/c indeed, drinking through a straw. When it's all I/O, even the insane GPU bandwidth still assumes enough compute to go with it. A couple of GPU vendors pitch themselves as GPU DBs, and that's tough positioning when the assumption is all the data lives in the DB. From what I can tell, that only works for < TB in practice, and ideally < 10GB with few concurrent users.

But if you're doing a lot of Spark/Impala/Druid style compute, where storage is probably separate anyways (parquets in HDFS/S3 -> ...) and there is increasingly math to go along with it (analytics, ML, neural nets, data viz, ...), different story. Now that stuff like regex is pretty easy with RAPIDS, instead of doing pandas -> spark or pandas -> rapids, I try to start with cudf to beginwith. (But definitely still not quite there.) We partner a bunch with BlazingSQL here, and they've always been chasing the out-of-core story here. A couple of the lesser-known GPU 'DB's do as well, such as FastData focusing explicitly on replacing spark/flint wrt both batch & streaming.

A few trends you may want to reexamine the #s on:

-- CPU perf/watt (~= perf/$) vs GPU perf/watt (~= perf/$), especially in cloud over last 10 years: GPU is steadily dropping while CPU isn't

-- CPU-era Spark and friends are increasingly bound by network I/O, while GPU boxes go for thicker. You can also do Spark on a thicker box, but at that point, might as well go shared GPU and keep it there (RAPIDS)

-- Nvidia & cloud providers have been pushing on direct-to-gpu and direct gpu<>gpu, including at commodity levels. Mellanox used to be a problem there, and now they control them. My guess is the bigger challenge in ~2yr will be rewriting RAPIDS for streaming & serverless & more transparent multi-GPU; the HW is hard but seems more predictable and much better staffed.

GPU isn't an end-all, but when a lot of CPU data libs are going data parallel / columnar, and Nvidia is improving more than Intel for perf/watt (= perf/$), the choice between multicore x SIMD vs GPU keeps tilting in Nvidia's favor.

wegs · on May 14, 2020

> There is also the problem of needing huge amounts of capital, where most of this work can only be done by exceedingly well funded groups with deep ties to hardware divisions in question. The future of hardware innovation comes from billion dollar companies, because only they can sustain it, not plucky engineers. Sure, for us, CUDA being open source would be awesome. But you don't really need open source drivers when you're working directly with the vendor on your requirements and you pay them millions for support and you just use Linux for everything.

I think the exact same argument could be made for mainframes and microcomputers before we standardized on x86. RISC architectures were cheaper and faster in the eighties and nineties than CISC, but x86 cleaned up because it was standard and had an ecosystem. NVidia is limiting its ecosystem to everyone who needs HPC, where the ecosystem should be everyone (no qualifier). All computers could benefit from a massively parallel MIMD co-processor.

> But what they also understand is that their software stack is a differentiator for them, because it actually works (the competitors don't) and it makes them money to keep it that way.

And I think Symbian made the same argument before being steamrolled by iOS and Android. And I've seen the same argument made by business folks at several businesses I've worked at.

By the way, "open" doesn't mean it's not okay to keep some pieces proprietary. NVidia can keep their differentiator by keeping key algorithms proprietary, while making the architecture open, and developing a common set of cross-platform APIs to target that architecture. For example, a cell phone maker can open source most of their OS, but keep pieces like the fancy ML integrated into their photography app (and other similar pieces) proprietary.

> Nvidia fully understands that maybe some nebulous benefit might come to them by open sourcing things, maybe years down the line.

I think you hit the nail on the head here. The benefits of open feel nebulous; it's a long-tail effect and difficult to quantify. It also takes time. On the other hand, the benefits of proprietary are short-term and easy to quantify. Wrong business decisions get made all the time. Indeed, bad business decisions sometimes get made where everyone can tell it's the wrong decision -- it's just org structures are set up to make those decisions. I think this isn't me claiming to be brilliant or smarter than NVidia so much as NVidia failing in the same exact way many organizations fail, by the design of the org structures.

> They understand plucky researchers can do amazing things, sometimes.

It's actually not just about amazing things. It's about a long tail of dumb stuff too. My phone has a few apps better than Google could build. It has dozens of apps Google chose not to build. Most of the stuff I want to do isn't big enough to ever show up on NVidia's radar, but there are a lot of people like me. Symbian didn't make a piano tuner app. It's not hard to make one. I have one, though.

Of course, there are brilliant pieces too. I have some VR/AR apps on my phone which Google would need to invest a lot of capital to make.

> EDIT: I'll also say that if this changes from their "major open source announcement" they were going to do at GTC, I'll eat my hat. I'm not expecting much from Nvidia in terms of open source, but I'd happily be proven wrong. But broadly I think my general point stands, which is that thinking about it from the POV of "open source drivers are the limitation" isn't really the right way to think about it.

I'm not holding my breath for NVidia to change. But I do hope at some point, we'll see a nice, open MIMD architecture which gives me that nice 10-100x speedup for parallel workloads. I actually couldn't care less about whether that speed-up is 50x or 100x (which is where NVidia's deep R&D advantage lies). That matters for bitcoin mining or deep learning. For the long tail I'm talking about, the baseline speedup is plenty good enough. The cleverness doesn't come from pushing extra CPU cycles out, but in APIs, ecosystem building, openness, standardization, etc. That stuff is a different kind of hard.