Keep in mind that Apple's AMX is pretty much undocumented and not guaranteed to be stable, so the effort invested into integrating reverse engineered work might not be the best allocation of resources. Particularly now that M4 supports ARM SME which is the "official" extension (though not yet offered via hardware intrinsics in .NET (pretty much no hardware on the market supports it as of now), with the closest one in the form of SVE2 coming in .NET 9).
I did look once at back-end implementations and contributing either Metal back-end or adapting Vulkan back-end to run on top of MoltenVK seemed much more realistic.
With that said, OpenCL already works as is so you are not completely unsupported.
No disagreement on allocation. I just don't see why anyone would use something like this in the CUDA/CPU ecosystem where lots of established alternatives exist that do not require one to use C# or F#, neither of which is common in scientific computing.
OTOH the field is almost bare on the Apple side, in spite of there being over a billion devices out there with relatively low hardware fragmentation and almost uniform ISA throughout the entire product lineup. Hence the suggestion.
C# and F# are very niche that is true. I think the basics of the languages are way nicer and saner than Python's and C++'s.
You can just give it a try and see if you like it or not.
In general, I think you are right and Python has completely won for high-level libraries while C++ has completely won for implementation, and threshold for making people move is way too high: difficult to match 5-10x improvement over Python in experience, and C++ crowd would never even look at C#, let alone think it has something to offer them, because the mythology says that the only true way is C/C++ for this kind of code and C# is just weird Java (especially now with ggml being on the radar of many).
(fun thought experiment: imagine average reaction to a statement "you can write high level code that compiles to Metal Performance Shaders or targets Apple AMX but it's C#", not dissimilar to a reaction when people hear that C# is the prime choice for portable SIMD code)
I did look once at back-end implementations and contributing either Metal back-end or adapting Vulkan back-end to run on top of MoltenVK seemed much more realistic.
With that said, OpenCL already works as is so you are not completely unsupported.