friend you shouldn't make comments like this unless you understand the definitions of the words. Deepseek wrote some parts of their kernels using PTX. newsflash: PTX support for features is lockstep with CUDA support for the same features ie the fact that CUDA doesn't support it means you couldn't write the PTX to use those features either.
The PTX hack is for backend runner and training infra, the public weights are often executed using existing backends. Especially R1-distill-* models are.
the two things (weights and kernels) have nothing to do with each other in the slightest. again i wish people would take a beat before commenting out of their depth and consider whether their comment adds to the conversation or not.
it is - they're laughably slow and not even supported by latest CUDA
> NVIDIA Driver support for Kepler is removed beginning with R495. CUDA Toolkit development support for Kepler continues through CUDA 11.x.