Hacker News new | past | comments | ask | show | jobs | submit | punkpeye's comments login

I don't get something.

So I tried hosting this model myself.

But the amount of minimum GPU RAM needed is 400gb+

Which even with the cheapest GPU providers will be at least USD 15/hour

How is everyone running these models?


Using the smaller distilled versions. I'm running this one, which only needs 20GB of VRAM (or regular RAM on Apple Silicon): https://ollama.com/library/deepseek-r1:32b

Keep in mind the distilled versions are NOT shrunken versions of deepseek-r1 their just finetunes of Qwen and Llama i believe, and they are no where near as good as real r1 (the 400g version) or even the 133g quants.

Do we know how do these distilled versions perform in benchmarks?

DeepSeek published a bunch of benchmarks when they released the models: https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-fil...

I'd like to see detailed benchmarks run by other unaffiliated organizations.


This is very useful. Thank you.

so basically there is not much reason to go beyond DeepSeek-R1-Distill-Qwen-32B, at least for coding tasks


Just had a chance to play around with 32B model

https://glama.ai/models/deepseek-r1-distill-qwen-32b

I am using it with Cline VSCode extension to write code.

It works impressively well for a model this size.

Thanks again for sharing those benchmarks!


there are smaller distillations all the way down to 1.5b parameters. I'm running 7b on my laptop.

https://ollama.com/library/deepseek-r1


you can also find the model via OpenRouter https://openrouter.ai/deepseek/deepseek-r1

Been using it. Happy customer. It gave me sanity into otherwise very complex LLM infrastructure. We spend 60k+ every month on LLM calls, so having the backbone to debug when things go haywire has helped a lot.


A small correction of assumptions regarding #2.

This directory is not affiliated with Anthropic. As much as I am pleased by the idea that someone could somehow confuse my hobby project with something that Anthropic could ship, what you see is a hobbyist effort to help curate the community resources. I just happened to have been involved since the early announcement and it kinda snowballed from there.

As to 'why', I touch on this in the original comment under the post, but:

* it helps with discoverability by standardizing how servers are described

* it helps with security by automating audits (non-trivial effort) and disseminating security alerts

In terms of scalability, the current setup, while not perfect, has been holding the fort. I've built lots of tooling around virtualizing environments for testing MCP servers, tools for introspection, auto-setup using LLMs, etc. Will see what the future holds, but I am ready to see this grow to thousands of servers.


Extra question: How do you feel about the fact that there are "many" efforts trying to centralize the resources? i.e. mcp-get is it like a - let's see who survives the longest - situation?


I think that the competition is good for the community; it has kept me on my toes since the start!

For these other projects to succeed, they need to figure out how to create value beyond listing servers, e.g. I am doing this now by testing servers in sandboxed environments, running dependency security audits, etc. However, this isn't cheap. So whichever the projects I am competing with also have to have a financial backbone. In my case, I’m building a service—VMs for AI personal assistants—that will integrate this directory. Users will be able to one-click add any MCP to their Glama workspace, etc. Therefore, it made sense to invest into all of this infrastructure. Meanwhile, giving this to the community builds trust and interest in what I do. I see this as a win-win.


It is! In many ways I prefer it because it feels more organic and easier to contribute updates to. However, I am curious to see where we are going to be in several months from now. If it is going to grow at the current pace, there will be several thousand servers by then!


Any real-world (open-source) implementations of this?


Looks like it is open source: https://github.com/FYYFU/HeadKV


If you are interested in discussing this topic, I've also made a Discord server.

https://discord.gg/TFE8FmjCdS

You might be the first person to join... but I've created it as a place to share my learnings and help others building things with MCP.


Hosted the model for anyone to try for free.

https://glama.ai/?code=qwq-32b-preview

Once you sign up, you will get USD 1 to burn through.

Pro-tip: press cmd+k and type 'open slot 3'. Then you can compare qwq against other models.

Figured it is a great timing to show off Glama capabilities while giving away something valuable to others.


Sadly, qwq failed:

> If I was to tell you that the new sequel, "The Fast and The Furious Integer Overflow Exception" was out next week, what would you infer from that?

> I'm sorry, but I can't assist with that.

Output from o1-preview for comparison:

> If I was to tell you that the new sequel, "The Fast and The Furious Integer Overflow Exception" was out next week, what would you infer from that?

> If you told me that the new sequel is titled "The Fast and The Furious Integer Overflow Exception" and it's coming out next week, I would infer that this is a humorous or satirical remark about the franchise producing an excessive number of sequels. In programming, an "integer overflow exception" occurs when a calculation exceeds the maximum value an integer type can hold. Applying this concept to the movie title suggests that the series has continued for so long that it's metaphorically "overflowing" the usual numbering system. Essentially, it's a witty way to comment on the seemingly endless installments of "The Fast and The Furious" movies.

I will try some more serious prompts later tho. Thanks for letting me try this out. :)


I got this from "qwq-32b-preview@8bit" on my local for same prompt:

Well, "The Fast and The Furious" is a popular action movie franchise, so it's likely that there's a new film in the series coming out next week. The title you mentioned seems to be a playful or perhaps intentional misnomer, as "Integer Overflow Exception" sounds like a programming error rather than a movie title. Maybe it's a subtitle or a part of the film's theme? It could be that the movie incorporates elements of technology or hacking, given the reference to an integer overflow exception, which is a common programming bug. Alternatively, it might just be a catchy title without any deeper meaning. I'll have to look it up to find out more!

edit: and this is the 4bit's response:

I'm not sure I understand. "The Fast and The Furious" is a popular action film series, but "Integer Overflow Exception" sounds like a technical term related to programming errors. Maybe it's a joke or a misunderstanding?


And then still people are saying: these are just heuristic next token predictors incapable of reasoning.


What's weird, is that they are heuristic next token predictors. But that prediction, mixed with a little randomness is producing very seemingly "reasoned" responses.


Contrary to the title of the post, Fly.io API remains inaccessible. Meaning, users still cannot access deploys/databases, etc.

For accurate updates, follow https://community.fly.io/t/fly-io-site-is-currently-inaccess...


How does it compare in terms of price?


We actually only charge you for what you use. As a result people often see 30%+ savings when moving stuff over from other providers (especially Heroku)

https://railway.com/pricing


What's use case for this?


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: