Hacker Newsnew | past | comments | ask | show | jobs | submit | xyc's commentslogin

I'm running it with ROG Flow Z13 128GB Strix Halo and getting 50 tok/s for 20B model and 12 tok/s for 120B model. I'd say it's pretty usable.

Excellent! I have a Framework Desktop with 128GB on preorder—really looking forward to getting it.

Great to see more local AI tools supporting MCP! Recently I've also added MCP support to recurse.chat. When running locally (LLaMA.cpp and Ollama) it still needs to catch up in terms of tool calling capabilities (for example tool call accuracy / parallel tool calls) compared to the well known providers but it's starting to get pretty usable.


hey! we're building Cactus (https://github.com/cactus-compute), effectively Ollama for smartphones.

I'd love to learn more about your MCP implementation. Wanna chat?


It's a protocol that doesn't dictate how you are calling the tool. You can use in-memory transport without needing to spin up a server. Your tool can just be a function, but with the flexibility of serving to other clients.


Are there any examples of that? All the documentation I saw seemed to be about building an MCP server, with very little about connecting an existing inference infrastructure to local functions.


For TypeScript you can refer to https://github.com/modelcontextprotocol/typescript-sdk/blob/...

There isn't much documentation available right now but you can ask coding agent eg. Claude Code to generate an example.


recurse.chat + M2 max Mac


I recently discovered toolhive which is pretty handy too https://github.com/stacklok/toolhive


If you are on a Mac, give https://recurse.chat/ a try. As simple as download the model and start chatting. Just added the new multimodal support in LLaMA.cpp.


Actually this is a good way to find product ideas. I placed a query in Grok to find posts about what people want, similar to this. Then it performs multiple searches on X including embedding search, and suggested people want stuff like tamagotchi, ICQ etc. back.


I feel like these are all great examples of things people think they want. Making a post about it is one thing, actually buying or using a product, I think the majority of nostalgic people will quickly remember why they don't actually want it in their adult life.


I see this a lot in vintage computing. What we want is the feelings we had back then, the context, the growing possibilities, youth, the 90s, whatever. What we get is a semi-working physical object that we can never quite fix enough to relive those experiences. But we keep acquiring and fixing and tinkering anyway hoping this time will be different while our hearts become torn between past and present.


Yeah this is not even faster horses. It's horses that can count like Clever Hans.


It seems that this is possibly not necessary, since LLaMA.cpp already integrates Jinja with CPP implementation (through minja)



The fact that there's no alternative implementation of SQLite also seems to play a part in preventing standardization of WebSQL.

https://www.w3.org/TR/webdatabase/

"The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path."


I was completely unaware of that! How old is that document? I should reach out.


That effort died about 14 years ago.


damn =(


an opportunity for all of us to celebrate the astonishing power of necromancy


Indeed! This sort of thing is a problem. It's the same with Internet protocols: you need at least two implementations to get to Standard.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: