Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This only uses llama correct? So the output should be the same as if you were only using llama.cpp. Am I the only one who doesn't get nearly the same quality of output using a quantized model compared to GPU? Some models I tried get astounding results when running on a GPU, but create only "garbage" when running on a CPU. Even when not quantized down to 4bit llama.cpp just doesn't compare for me. Am I alone with this?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: