Given IL itself is an abstract stack-based bytecode, it can be compiled to the c...

Given IL itself is an abstract stack-based bytecode, it can be compiled to the corresponding IR, which can then target corresponding back-end (CUDA, OpenCL, CPU, etc.) - this is what ILGPU does.

Because all code is in the single repository in the post and is fairly easy to read, you can skim through it to draw your own conclusions if this interests you.

Also, very easy to start using: just `dotnet add package ILGPU` on most configurations (as ADHD puts higher mental strain on activities involving complex configuration, I try to keep to the tools that have minimal ceremony)

C# (and F# by extension) generally allow to write system-ish code, with references to locals and same C primitives, which means that you're likely not sacrificing in performance in this particular scenario by having the language be higher-level. After all, you're using ILGPU's APIs first and foremost.

As to why use it at all - you are likely to move faster with it than C++, especially if it's not your full-time job, with all the escape hatches to extract 99.9% efficiency still on the table (that is, if performance of the kernel emitted by ILGPU has issues in the first place - see below for alternative, cheap FFI and easy C/C++ integration are still there as well).

It also lets you do things like PTX assembly: https://github.com/m4rs-mt/ILGPU/blob/master/Samples/InlineP...