Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To be fair, nothing comes close to Qwen2.5 atm


This is something that's obvious to anyone playing with local LLMs but that doesn't seem to be that much well-known even among tech enthusiast.

Qwen is really ahead of the pack right now when it comes to weight-available models.


How does it compare to Claude?


nothing compares to claude, not even gpt-4.


Claude is the best at coding but the limits are the problem. You only get like a handful of messages.


With free or with paid too?


You can pay for Claude API access (not normal Claude Pro) and wire in something like Cline via your API key, but it gets expensive fast in my experience.


which size are you using?

I don't see why you would use it over claude and 4o-mini with cursor unless you are working on a top secret repo


the company i work for and actually most Swiss IT contractors have harsh rules, and more than half of our projects, we aren't allowed to use Github Copilot or pasting stuff to any LLM API.

For that matter I built a vLLM based local GPU machine for our dev squads as a trial. Currently using a 4070Ti Super with 16GB Vram and upgrading to 4x 4070Ti Super to support 70b models.

The difficulties we face IMHO:

- Cursor doesn't support WSL Devcontainers

- Small Tab-Complete models are more important, and there's less going on for those

- There's a huge gap between 7-14b and 120b models, not a lot of 70b models available

In reality, on 7-14b nothing beats Qwen2.5 for interactive coding and something around 2b for tab-completion


> - Cursor doesn't support WSL Devcontainers

If it works for you, devcontainers now work under Linux w/ docker.


> I don't see why you would use it over claude and 4o-mini with cursor unless you are working on a top secret repo

Plenty of companies won't let you use those products with our internal code.


Question for those using it. Can the 7B really be used locally on a card with only 16GB VRAM? LLM Explorer says[1] it requires 15.4GB. That seems like cutting it close.

1. https://llm.extractum.io/model/Qwen%2FQwen2.5-7B,58qKLCI6ani...


I am happily using qwen2.5-coder-7b-instruct-q3_k_m.gguf with a context size of 32768 on an RTX 3060 Mobile with 6GB VRAM using llama.cpp [2]. With 16GB VRAM, you could use qwen2.5-7b-instruct-q8_0.gguf which is basically indistinguishable from the fp16 variant.

[1] https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF

[2] https://github.com/ggerganov/llama.cpp


don't know how they are getting top of Qwen at very poor quality via humaneval bench.


Not even deepseek coder 2.5?


Not according to the scores here https://github.com/QwenLM/Qwen2.5-Coder


Can't really take those seriously. Why would they rank competitor better




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: