What do you mean by "relatively universal"? This is Cuda only [0] with a promise... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

eyegor 9 months ago | parent | context | favorite | on: FlashAttention-3: Fast and Accurate Attention with...

What do you mean by "relatively universal"? This is Cuda only [0] with a promise of a rocm backend eventually. There's only one project I'm aware of that seriously tries to address the Cuda issue in ml [1].

[0] https://github.com/HazyResearch/ThunderKittens?tab=readme-ov...

[1] https://github.com/vosen/ZLUDA

f_devd 9 months ago [–]

If you read the article I linked they show that it's entirely based on 16x16 matrices (or "tiles") which is fairly standard across gpus.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact