I wonder if (when) there will be a GGUF model available for this 8B model. I wan...

bugglebeetle · 2025-01-20T15:05:47 1737385547

YC’s own incredible Unsloth team already has you covered:

https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B

DrPhish · 2025-01-20T15:02:21 1737385341

Making your own ggufs is trivial: https://rentry.org/tldrhowtoquant/edit

It's a bit harder when they've provided the safetensors in FP8 like for the DS3 series, but these smaller distilled models appear to be BF16, so the normal convert/quant pipeline should work fine.

bochoh · 2025-01-20T15:06:24 1737385584

Thanks for that! It seems that unsloth actually beat me to [it](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-...)!

Edit: Running the DeepSeek-R1-Distill-Llama-8B-Q8_0 gives me about 3t/s and destroys my system performance on the base m4 mini. Trying the Q4_K_M model next.

tucnak · 2025-01-20T19:32:08 1737401528

Not trivial as long as imatrix is concerned: we've found it substantially improves performance in Q4 for long Ukrainian contexts. I imagine, it's similarly effective in various other positions.