Documentation is lagging reality a bit, we'll probably fix that around the next ...

keldaris · on Nov 17, 2023

Is there a way to directly use these developments to already write a reasonable subset of C/C++ for simpler usecases (basically doing some compute and showing the results on screen by just manipulating pixels in a buffer like you would with a fragment/pixel shader) in a way that's portable (across the three major desktop platforms, at least) without dealing with cumbersome non-portable APIs like OpenGL, OpenCL, DirectX, Metal or CUDA? This doesn't require anything close to full libc functionality (let alone anything like the STL), but would greatly improve the ergonomics for a lot of developers.

JonChesterfield · on Nov 17, 2023

I'll describe what we've got, but fair warning that I don't know how the write pixels to the screen stuff works on GPUs. There are some instructions with weird names that I assume make sense in that context. Presumably one allocates memory and writes to it in some fashion.

LLVM libc is picking up capability over time, implemented similarly to the non-gpu architectures. The same tests run on x64 or the GPU, printing to stdout as they go. Hopefully standing up libc++ on top will work smoothly. It's encouraging that I sometimes struggle to remember whether it's currently running on the host or the GPU.

The datastructure that libc uses to have x64 call a function on amdgpu, or to have amdgpu call a function on x64, is mostly a blob of shared memory and careful atomic operations. That was originally general purpose and lived on a prototypey GitHub. Its currently specialised to libc. It should end up in an under-debate llvm/offload project which will make it easily reusable again.

This isn't quite decoupled from vendor stuff. The GPU driver needs to be running in the kernel somewhere. On nvptx, we make a couple of calls into libcuda to launch main(). On amdgpu, it's a couple of calls into libhsa. I did have an opencl loader implementation as well but that has probably rotted, intel seems to be on that stack but isn't in llvm upstream.

A few GPU projects have noticed that implementing a cuda layer and a spirv layer and a hsa or hip layer and whatever others is quite annoying. Possibly all GPU projects have noticed that. We may get an llvm/offload library that successfully abstracts over those which would let people allocate memory, launch kernels, use arbitrary libc stuff and so forth running against that library.

That's all from the compute perspective. It's possible I should look up what sending numbers over HDMI actually is. I believe the GPU is happy interleaving compute and graphics kernels and suspect they're very similar things in the implementation.

pjmlp · on Nov 17, 2023

CUDA allows for straight C++ for quite some time, that is how renderers like nanite are written.

https://docs.nvidia.com/cuda/cuda-c-std/index.html

"C++ Standard Parallelism"

https://www.youtube.com/watch?v=nwrgLH5yAlM

Or if you prefer more vendor neutral,

https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-...

Currently with C++17 support.

SubjectToChange · on Nov 17, 2023

I’m cautiously optimistic for SYCL. The absurd level of abstraction is a bit alarming, but single source performance portability would be a godsend for library authors.

pjmlp · on Nov 17, 2023

This is one area where I imagine C++ wannabe replacements like Rust having a very hard time taking over.

It took almost 20 years to move from GPU Assembly (DX 9 timeframe), shading languages, to regular C, C++, Fortran and Python JITs.

There are some efforts with Java, .NET, Julia, Haskell, Chapel, Futhark, however still trailing behind the big four.

Currently in terms of ecosystem, tooling and libraries, as far as I am aware, Rust is trailing those, and not yet being a presence on HPC/Graphics (Eurographics, SIGGRAPH) conferences.

SubjectToChange · on Nov 17, 2023

This is one area where I imagine C++ wannabe replacements like Rust having a very hard time taking over.

I 100% agree. Although I have a keen interest in Rust I can’t see it offering any unique value to the GPGPU or HPC space. Meanwhile C++ is gaining all sorts of support for HPC. For instance the parallel stl algorithms, mdspan, std::simd, std::blas, executors (eventually), etc. Not to mention all of the development work happening outside of the ISO standard, e.g. CUDA/ROCm(HIP)/OpenACC/OpenCL/OpenMP/SYCL/Kokkos/RAJA and who knows what else.

C++ is going to be sitting tight in compute for a long time to come.

pjmlp · on Nov 17, 2023

There is always the argument that it can help reduce the errors produced due to memory corruption.

However industry standards matter more.

SubjectToChange · on Nov 17, 2023

HPC researchers already employ some techniques to detect memory corruption, hardware flaws, floating point errors, and so on. Maybe Rust could meaningfully reduce memory errors, but if it comes at the cost of bounds checking (or any other meaningful runtime overhead) they will have absolutely zero interest.

pjmlp · on Nov 17, 2023

Chapel and Julia ongoing efforts proves otherwise, and I can tell from CERN days, not everyone uses those tools.

In any case, that means those languages are much better positioned than Rust in such ecosystem.

SubjectToChange · on Nov 17, 2023

If you’re willing to deal with 5 layers of C++ TMP, then a library like Kokkos will let you abstract over those APIs, or at least some of them. Eventually if or when SYCL is upstreamed in the llvm-project it’ll be possible to do it with clang directly.

KeplerBoy · on Nov 16, 2023

This is super interesting, thanks!