runsc / gVisor is interesting also as the runsc engine can be run from within Docker/Docker Desktop.
gVisor has performance problems, though. Their data shows 1/3rd the throughput vs. docker runtime for concurrent network calls--if that's an issue for your use-case.
In ollama, how do you set up the larger context, and figure out what settings to use? I've yet to find a good guide. I'm also not quite sure how I should figure out what those settings should be for each model.
There's context length, but then, how does that relate to input length and output length? Should I just make the numbers match? 32k is 32k? Any pointers?
Ollama breaks for me. If I manually set the context higher. The next api call from clone resets it back.
And ollama keeps taking it out of memory every 4 minutes.
LM studio with MLX on Mac is performing perfectly and I can keep it in my ram indefinitely.
Ollama keep alive is broken as a new rest api call resets it after. I’m surprised it’s this glitched with longer running calls and custom context length.
I would add an LLM like QwQ-32B to the mix--that has a ton of compressed knowledge embedded in it.
I would also store it in a steel Oscar the Grouch style trash can for a cheap faraday cage, which gets you protection from solar flares, and EMP blasts.
LLMs are a bad deal when you look at how much power you need to run that inference. A device that could barely run one instance of QwQ-32B at glacial speeds will be able to serve multiple concurrent users of Kiwix.
But--if you don't think of asking Hacker News every single thing you need to know beforehand, I think you still want the LLM to answer questions and help you bootstrap it.
Learning things from scratch is really hard too, just a copy of wikipedia gets one absolutely nowhere if you don't know what to search for.
Having something that you can plainly ask how to start that will point you in the right direction and explain the base concepts is worth a lot more, it turns raw data into genuine information. Yes it can be wrong sometimes, but so can human teachers and you can always verify, which is a good skill to practice in general.
See the Wired article on the rewright of German history. And The George Galloway article. The enshitification has not only begun, it's in rising force.
Very cool concept! There's a lot of potential in reducing the try-debug--fix cycle for LLMs.
On a related note, here's a Ruby gem I wrote that captures variable state from the moment an Exception is raised. It gets you non-interactive text-based debugging for exceptions.
I like it too. Memorable is good! Why not just put Hydraulic in front of the name of each other product? Hydraulic Deploy. Hydraulic Build. Etc. Seems scalable.
The same can be said of any bureaucracy's function. It isn't your fault that you made an abhorrently stupid decision, you were just following the directive. Not to say it isn't a problem, but that it isn't new.
gVisor has performance problems, though. Their data shows 1/3rd the throughput vs. docker runtime for concurrent network calls--if that's an issue for your use-case.