Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pretty much everyone these days uses a library for driving the GPU calculations. And they tend to either support multiple hardware targets directly (TensorFlow) or have API-compatible replacements (CuPy/NumPy).

So the lock-in risk here is that you might have to run your stuff on CPU if future NVIDIA GPUs are too overpriced.

I mean they are super expensive. But there's nothing that comes close to their cuBLAS library in terms of performance. So unless AMD ponys up and hires GPU algorithm engineers, NVIDIA will win simply due to their superior driver software.

I once had to optimize a CPU matrix multiplication algorithm. 10 days of work for a 2x speedup. Now imagine doing that for every one of the thousands of functions in the Blas library...



Yeah I think most people don't quite appreciate the difficulty and cost of optimizing for hardware and continually maintaining that through hardware cycles. In keeping things closed source Nvidia products have both the advantages of being easier to on-board due to simpler abstraction, and faster technical progress because there is less pushback from myriad parties when big inconvenient changes might need to happen at lower levels for hardware performance reasons kind of like if instead of x86 we instead settled on LLVM.


That is a very good metaphor :)

Actually, I wonder why we went with un-compilable Java bytecode and JIT instead of advancing projects like gcj.


I'm not surprised expecting to beat implementations of the basic Goto strategy for BLAS didn't turn out well. BLIS only needs a single, pared-down GEMM kernel for level3, and maybe one for TRSM. (It doesn't currently have GPU support, but I think there was an implementation mentioned in an old paper.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: