Which is so silly since ML models should be the most portable thing in the world. It's just a series of math operations, not a bunch of OS/hardware specific API calls or something like that. We should be at a stage where each ML model is boiled down to a simple executable with zero dependencies at this point.
Agree 100% and I spend a fair amount of time wondering why this hasn't happened. I built piet-gpu-hal because I couldn't find any abstraction layer over compute shaders that supports precompiled shaders. A motivated person absolutely could write shaders to do all the operations needed by Stable Diffusion, and ship a binary in the megabyte range (obviously not counting the models themselves). That would support Metal, Vulkan, and D3D12. The only thing holding this back is a will to build it.
This is the part that tensorflow is really good at, while just about everything else lags behind. The tf saved model is the graph plus weights, and is super easy to just load up and run. (Also, tflite for mobile...)
But one of the tricky parts with stable diffusion is that people are trying to get it to run on lighter hardware, which is basically another engineering problem where simple apis typically won't expose the kind of internals people want to mess around with.
That's the world of running machine learning models for you. Why would anything ever work the first time right? Or at least the 10th time...