Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you elaborate which models you are using? I‘m running an R1 distilled Qwen coder with 32B Q4, and while it’s giving useful answers, it‘s quite slow on my M1 Max. Slow enough that I keep reaching for cloud models.


Not on my machine currently, I use the 14b Q4 model I think, which delivers very good answers. I run a 4060 with 16gb memory and performance is quite good. I used the largest model that was recommended with this amount of VRAM, I think it was the 14b one.

I do have some applications that process images, text and pdf files and I use smaller models for extracting embeddings. I think my system wouldn't be able to handle it with decent speed otherwise.

I do run LLM on a M1 16gb macbook air and performance is surprisingly good. Not for image synthesis though and a PC with a dedicated GPU is still significantly faster with LLM responses as well. Haven't tried to run deepseek on the macbook yet.


Interesting, I didn’t like the quality of the output of the 14B models. Could be the quantization though, apparently some are a bit broken.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: