Can you elaborate which models you are using? I‘m running an R1 distilled Qwen c...

raxxor · 2025-01-28T12:45:43 1738068343

Not on my machine currently, I use the 14b Q4 model I think, which delivers very good answers. I run a 4060 with 16gb memory and performance is quite good. I used the largest model that was recommended with this amount of VRAM, I think it was the 14b one.

I do have some applications that process images, text and pdf files and I use smaller models for extracting embeddings. I think my system wouldn't be able to handle it with decent speed otherwise.

I do run LLM on a M1 16gb macbook air and performance is surprisingly good. Not for image synthesis though and a PC with a dedicated GPU is still significantly faster with LLM responses as well. Haven't tried to run deepseek on the macbook yet.

manmal · 2025-01-30T06:12:34 1738217554

Interesting, I didn’t like the quality of the output of the 14B models. Could be the quantization though, apparently some are a bit broken.