Pretty much everyone these days uses a library for driving the GPU calculations....

andromeduck · on May 14, 2020

Yeah I think most people don't quite appreciate the difficulty and cost of optimizing for hardware and continually maintaining that through hardware cycles. In keeping things closed source Nvidia products have both the advantages of being easier to on-board due to simpler abstraction, and faster technical progress because there is less pushback from myriad parties when big inconvenient changes might need to happen at lower levels for hardware performance reasons kind of like if instead of x86 we instead settled on LLVM.

fxtentacle · on May 14, 2020

That is a very good metaphor :)

Actually, I wonder why we went with un-compilable Java bytecode and JIT instead of advancing projects like gcj.

gnufx · on May 14, 2020

I'm not surprised expecting to beat implementations of the basic Goto strategy for BLAS didn't turn out well. BLIS only needs a single, pared-down GEMM kernel for level3, and maybe one for TRSM. (It doesn't currently have GPU support, but I think there was an implementation mentioned in an old paper.)