How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs

jamesaross · 2025-02-02T02:31:06 1738463466

I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b

The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer

You can view the model performance within ollama using the command: /set verbose

[0] https://github.com/ollama/ollama

waltercool · 2025-02-02T03:41:13 1738467673

Yup, this is what deepseek does.

The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation

https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...

stonecharioteer · 2025-02-02T14:45:10 1738507510

I've been running 32b as well.

But I cannot find it in LM Studio, what am I doing wrong that I only find distilled models?

nickthegreek · 2025-02-02T15:23:31 1738509811

32b is distilled model. Only the 670b is not.

larntz · 2025-02-02T01:24:30 1738459470

I wrote a similar post about a week ago, but for an [unsupported] Radeon RX 5500 with 4Gi RAM with ollama and fedora 41. Can only run llama:3.2 or deepseek-r1:1.5b, but they're pretty usable if you're ok with a small model and it's for personal use.

I didn't go into detail about how to setup openweb-ui, but there is documentation for the on the project's site.

https://blue42.net/linux-ollama-radeon-rx5500/post/

saelthavron · 2025-02-02T04:53:43 1738472023

You have a typo in your ollama.service:

Environmetn="ROCR_VISIBLE_DEVICES=1"

The 't' and 'n' are transposed.

tarasglek · 2025-02-02T08:28:51 1738484931

Great blog

FloatArtifact · 2025-02-02T05:12:38 1738473158

Has anyone seen an in-depth comparison between radeon 7900 xtx vs 3090 rtx specific to R1?

AbuAssar · 2025-02-02T10:25:59 1738491959

https://community.amd.com/t5/ai/experience-the-deepseek-r1-d...

grigio · 2025-02-02T08:44:36 1738485876

Ok, but how many tokens/sec will run a 70B model on AMD 395+ ?

heavyset_go · 2025-02-02T01:16:35 1738458995

As an aside, either the latest Linux or 6.14 has/will have support for Ryzen XDNA AI chips on their mobile APUs.

Might not be appropriate for this model, but it could be for small models.

lostmsu · 2025-02-02T22:03:44 1738533824

What is the programming interface for these drivers? E.g. how do I actually call the NPU? Any existing software?

heavyset_go · 2025-02-04T01:44:23 1738633463

Not a clue, it's new to me as well. This[1] might have some relevant info.

[1] https://github.com/amd/xdna-driver

ekianjo · 2025-02-02T01:17:11 1738459031

any idea how they will appear to the OS? As additional processors?

p_l · 2025-02-02T03:13:28 1738466008

A coprocessor available by mmap/ioctl over some special device files, slightly different from existing XDNA support due to different management interface (the actual platform has been sold for some time as part of high-end FPGAs, but "RyzenAI" has different integration interface on silicon)

numpad0 · 2025-02-02T03:23:15 1738466595

There are also Vulkan backends included in most all-in-one LLM runners for a while; can be useful for cards not supported by ROCm.

shosca · 2025-02-02T01:24:07 1738459447

in my case with a 6900xt:

1. sudo pacman -S ollama-rocm

2. ollama serve

3. ollama run deepseek-r1:32b

larntz · 2025-02-02T01:49:02 1738460942

Does that entire model fit in gpu memory? How's it run?

I tried running a model larger than ram size and it loads some layers into the gpu but offloads to the cpu also. It's faster than cpu alone for me, but not by a lot.

shosca · 2025-02-02T02:13:05 1738462385

you're right, actually noticed gpu clocking up and down with 32b, 14b clocks up fully and actually runs faster

heavyset_go · 2025-02-02T01:47:16 1738460836

Nice, last time I tried out ROCm on Arch a few years ago it was a nightmare. Glad to see it's just one package install away these days, assuming you didn't do any setup beforehand.

qskousen · 2025-02-02T01:59:14 1738461554

I think you do still have to have the ROCm drivers installed, but it's not very hard to do from AMD's website.

shosca · 2025-02-02T02:13:33 1738462413

everything from arch repos, well cachyos and arch :)