> Is that not the case? it is - they're laughably slow and not even supported by...

GTP · 2025-02-12T08:19:21 1739348361

But Deepseek R1 doesn't use CUDA, so maybe for this specific case, it isn't a big deal?

almostgotcaught · 2025-02-12T08:51:05 1739350265

> it isn't a big deal?

friend you shouldn't make comments like this unless you understand the definitions of the words. Deepseek wrote some parts of their kernels using PTX. newsflash: PTX support for features is lockstep with CUDA support for the same features ie the fact that CUDA doesn't support it means you couldn't write the PTX to use those features either.

therealfiona · 2025-02-12T18:50:50 1739386250

It is poor form to condemn someone from asking a question.

Thank you for providing the information to clear up ignorance though.

almostgotcaught · 2025-02-13T00:12:52 1739405572

this is a question:

> is deepseak's use of PTX instead of CUDA relevant here?

this is a conclusion/assumption thinly veiled as a question

> Deepseek R1 doesn't use CUDA, so ... it isn't a big deal?

note, genuine questions don't already presuppose an answer.

GTP · 2025-02-18T15:34:36 1739892876

Asking if it is a big deal or not is definitely a question ;) Thank you for providing the information I was missing though.

numpad0 · 2025-02-12T17:13:00 1739380380

The PTX hack is for backend runner and training infra, the public weights are often executed using existing backends. Especially R1-distill-* models are.

almostgotcaught · 2025-02-14T14:48:41 1739544521

the two things (weights and kernels) have nothing to do with each other in the slightest. again i wish people would take a beat before commenting out of their depth and consider whether their comment adds to the conversation or not.