Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My understanding was that GPU instruction level parallelism is quite limited compared to CPUs (since multiple "threads" are running on each hardware core) and I wasn't aware that GPUs had any meaningful OOO execution.

If I'm wrong, I'd be happy to learn more.



This is arguable depending on your definition of ILP. CPUs try to extract parallelism from a single instruction stream in order to execute many instructions from the same stream in parallel. This is very costly in silicon area per instruction executed. GPUs don't need to do this because the programs run on them are "embarrassingly parallel" - they have lots of available parallelism and explicitly tell the GPU where it is. So GPUs execute many more instructions in parallel than CPUs, but they don't usually do any work to try and find implicit parallelism




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: