Upcloud, Vultr, Scaleway, Linode, DigitalOcean, etc have all included managed k8s. I think the standards have moved up and it's pretty much a standard feature of most bare bones cloud now.
It's not easy to move a group to a new platform, and there's very little people using Lemmy currently. The thing about Reddit despite its flaws is that it allows access to many large communities with a single account.
The fact that this is commodity hardware makes ggml extremely impressive and puts the tech in the hands of everyone. I recently reported my experience running 7B llama.cpp on a 15 year old Core 2 Quad [1] - when that machine came out it was a completely different world and I certainly never imagined how AI would look like today. This was around when the first iPhone was released and everyone began talking about how smartphones would become the next big thing. We saw what happened 15 years later...
Today with the new k-quants users are reporting that 30B models are working with 2-bit quantization on 16GB CPUs and GPUs [2]. That's enabling access to millions of consumers and the optimizations will only improve from there.
I wonder if on-die RAM is less susceptible to memory errors?
I suspect that it is. Feels like less can go wrong. You have physically shorter interconnects, and the RAM is perhaps more of a known quantity relative to $SOME_RANDOM_MANUFACTURERS_DIMMS. But that is only a guess.
However, I don't know if that's true. I guess it's not necessarily more resistant to random cosmic rays or whatever.