Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reasonable speeds are possible if you pay someone else to run it. Right now both NovitaAI and Parasail are running it, both available through Openrouter and both promising not to store any data. I'm sure the other big model hosters will follow if there's demand.

I may not be able to reasonably run it myself, but at least I can choose who I trust to run it and can have inference pricing determined by a competitive market. According to their benchmarks the model is about in a class with Claude 4 Sonet, yet already costs less than one third of Sonet's inference pricing



I’m actually finding Claude 4 Sonnet’s thinking model to be too slow to meet my needs. It literally takes several minutes per query on Cursor.

So running it locally is the exact opposite of what I’m looking for.

Rather, I’m willing to pay more, to have it be run on a faster than normal cloud inference machine.

Anthropic is already too slow.

Since this model is open source, maybe someone could offer it at a “premium” pay per use price, where the response rate / inference is done a lot faster, with more resources thrown at it.


Anthropic isn't slow. I'm running Claude Max and it's pretty fast. The problem is that Cursor slowed down their responses in order to optimize their costs. At least a ton of people are experiencing this.


> It literally takes several minutes per query on Cursor.

There's your issue. Use Claude Code or the API directly and compare the speeds. Cursor is slowing down requests to maintain costs.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: