Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On the same basis, it would also help if you could provide a comparison between GPUs commonly used for ML. Tesla k80, P100, T4, V100 and A100. How has the architecture evolved to make the A100 significantly faster? Is it just the 80GB RAM, or there is more to it from an architecture standpoint?


> How has the architecture evolved to make the A100 significantly faster?

Oh, very much so. By way more than an order of magnitude. For a deeper read, have a look at the "architecture white papers" for Kepler, Pascal, Volta/Turing, and Ampere:

https://duckduckgo.com/?t=ffab&q=NVIDIA+architecture+white+p...

or check out the archive of NVIDIA's parallel4all blog ... hmm, that's weird, it seems like they've retired it. They used to have really good blog posts explaining what's new in each architecture.

You could also have a look here:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....

for the table of various numeric sizes and limits which change with different architectures. But that's not a very useful resource in and of itself.


You may find this[0] helpful (note -- download link to a .PDF). It's the GA100 whitepaper.

[0]: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...


As a starter T4 is heavily optimized for low power consumption on inference tasks. IIRC it doesn’t even require additional power beyond what the PCIe bus can provide but basically useless for training unlike the others.


One day I'll get my hands on both an A40 and an A100 and I'll maybe get an answer to the question: does the 5120bits memory bus help that much? The A100 has less cuda cores, around 1/4 more tensor cores but seems to be the preferred 'compute' and 'ai training' option all around. What gives?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: