Hacker News new | past | comments | ask | show | jobs | submit login

What do you mean by "relatively universal"? This is Cuda only [0] with a promise of a rocm backend eventually. There's only one project I'm aware of that seriously tries to address the Cuda issue in ml [1].

[0] https://github.com/HazyResearch/ThunderKittens?tab=readme-ov...

[1] https://github.com/vosen/ZLUDA




If you read the article I linked they show that it's entirely based on 16x16 matrices (or "tiles") which is fairly standard across gpus.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: