To be fair, nothing comes close to Qwen2.5 atm

littlestymaar · on Nov 9, 2024

This is something that's obvious to anyone playing with local LLMs but that doesn't seem to be that much well-known even among tech enthusiast.

Qwen is really ahead of the pack right now when it comes to weight-available models.

drawnwren · on Nov 9, 2024

How does it compare to Claude?

behnamoh · on Nov 9, 2024

nothing compares to claude, not even gpt-4.

sourcecodeplz · on Nov 10, 2024

Claude is the best at coding but the limits are the problem. You only get like a handful of messages.

yumraj · on Nov 10, 2024

With free or with paid too?

girvo · on Nov 10, 2024

You can pay for Claude API access (not normal Claude Pro) and wire in something like Cline via your API key, but it gets expensive fast in my experience.

tomr75 · on Nov 9, 2024

which size are you using?

I don't see why you would use it over claude and 4o-mini with cursor unless you are working on a top secret repo

underlines · on Nov 10, 2024

the company i work for and actually most Swiss IT contractors have harsh rules, and more than half of our projects, we aren't allowed to use Github Copilot or pasting stuff to any LLM API.

For that matter I built a vLLM based local GPU machine for our dev squads as a trial. Currently using a 4070Ti Super with 16GB Vram and upgrading to 4x 4070Ti Super to support 70b models.

The difficulties we face IMHO:

- Cursor doesn't support WSL Devcontainers

- Small Tab-Complete models are more important, and there's less going on for those

- There's a huge gap between 7-14b and 120b models, not a lot of 70b models available

In reality, on 7-14b nothing beats Qwen2.5 for interactive coding and something around 2b for tab-completion

NitpickLawyer · on Nov 10, 2024

> - Cursor doesn't support WSL Devcontainers

If it works for you, devcontainers now work under Linux w/ docker.

girvo · on Nov 10, 2024

> I don't see why you would use it over claude and 4o-mini with cursor unless you are working on a top secret repo

Plenty of companies won't let you use those products with our internal code.

guerrilla · on Nov 10, 2024

Question for those using it. Can the 7B really be used locally on a card with only 16GB VRAM? LLM Explorer says[1] it requires 15.4GB. That seems like cutting it close.

1. https://llm.extractum.io/model/Qwen%2FQwen2.5-7B,58qKLCI6ani...

johndough · on Nov 10, 2024

I am happily using qwen2.5-coder-7b-instruct-q3_k_m.gguf with a context size of 32768 on an RTX 3060 Mobile with 6GB VRAM using llama.cpp [2]. With 16GB VRAM, you could use qwen2.5-7b-instruct-q8_0.gguf which is basically indistinguishable from the fp16 variant.

[1] https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF

[2] https://github.com/ggerganov/llama.cpp

v3ss0n · on Nov 9, 2024

don't know how they are getting top of Qwen at very poor quality via humaneval bench.

rnewme · on Nov 9, 2024

Not even deepseek coder 2.5?

viraptor · on Nov 9, 2024

Not according to the scores here https://github.com/QwenLM/Qwen2.5-Coder

rnewme · on Nov 12, 2024

Can't really take those seriously. Why would they rank competitor better