Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel the same.

For example some stats from Whisper [0] (audio transcoding, 30 seconds) show the following for the medium model (see other models in the link):

---

GPU medium fp32 Linear 1.7s

CPU medium fp32 nn.Linear 60.7s

CPU medium qint8 (quant) nn.Linear 23.1s

---

So the same model runs 35.7 times faster on GPU, and compared to an "optimized" model still 13.6.

I was expecting around an order or magnitude of improvement.

Then again, I do not know if in the case of this article the entire model was in the GPU, or just a fraction of it (22 layers) and the remainder on CPU, which might explain the result. Apparently that's the case, but I don't know much about this stuff.

[0] https://github.com/MiscellaneousStuff/openai-whisper-cpu



Training and inference on GPUs significantly underutilize …the GPUs. So tuning and various tricks need to be applied to achieve dramatic performance gains. If I am not good at cooking, giving me a larger kitchen will not make me faster or better.


You last paragraph is correct. Only about half the model was running on the GPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: