Hacker News new | past | comments | ask | show | jobs | submit login
How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs (guru3d.com)
82 points by waltercool 75 days ago | hide | past | favorite | 22 comments



I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b

The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer

You can view the model performance within ollama using the command: /set verbose

[0] https://github.com/ollama/ollama


Yup, this is what deepseek does.

The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation

https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...


I've been running 32b as well.

But I cannot find it in LM Studio, what am I doing wrong that I only find distilled models?


32b is distilled model. Only the 670b is not.


I wrote a similar post about a week ago, but for an [unsupported] Radeon RX 5500 with 4Gi RAM with ollama and fedora 41. Can only run llama:3.2 or deepseek-r1:1.5b, but they're pretty usable if you're ok with a small model and it's for personal use.

I didn't go into detail about how to setup openweb-ui, but there is documentation for the on the project's site.

https://blue42.net/linux-ollama-radeon-rx5500/post/


You have a typo in your ollama.service:

Environmetn="ROCR_VISIBLE_DEVICES=1"

The 't' and 'n' are transposed.


Great blog


Has anyone seen an in-depth comparison between radeon 7900 xtx vs 3090 rtx specific to R1?



Ok, but how many tokens/sec will run a 70B model on AMD 395+ ?


As an aside, either the latest Linux or 6.14 has/will have support for Ryzen XDNA AI chips on their mobile APUs.

Might not be appropriate for this model, but it could be for small models.


What is the programming interface for these drivers? E.g. how do I actually call the NPU? Any existing software?


Not a clue, it's new to me as well. This[1] might have some relevant info.

[1] https://github.com/amd/xdna-driver


any idea how they will appear to the OS? As additional processors?


A coprocessor available by mmap/ioctl over some special device files, slightly different from existing XDNA support due to different management interface (the actual platform has been sold for some time as part of high-end FPGAs, but "RyzenAI" has different integration interface on silicon)


There are also Vulkan backends included in most all-in-one LLM runners for a while; can be useful for cards not supported by ROCm.


in my case with a 6900xt:

1. sudo pacman -S ollama-rocm

2. ollama serve

3. ollama run deepseek-r1:32b


Does that entire model fit in gpu memory? How's it run?

I tried running a model larger than ram size and it loads some layers into the gpu but offloads to the cpu also. It's faster than cpu alone for me, but not by a lot.


you're right, actually noticed gpu clocking up and down with 32b, 14b clocks up fully and actually runs faster


Nice, last time I tried out ROCm on Arch a few years ago it was a nightmare. Glad to see it's just one package install away these days, assuming you didn't do any setup beforehand.


I think you do still have to have the ROCm drivers installed, but it's not very hard to do from AMD's website.


everything from arch repos, well cachyos and arch :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: