Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm aware of out of order execution but my experience is that at least with SSE/AVX instruction ordering does matter... The loads do have a dependency on the address for example. Anyways, some experimentation will help, even loop unrolling doesn't always behave the way you'd think.


The loads do have a dependency on the address for example.

Sort of. They depend on the address, but there is nothing that prevents the address from being calculated ahead of time. So what happens is that both the loads and the addition get executed multiple iterations ahead of the multiplication. While we tend to think of one iteration completing before the next begins, from the processors point of view it's just a continuous stream of instructions.


Loops (especially FP loops) are often dependency chains limited wich prevents OoO execution. Unrolling (and using multiple accumulators) help create multiple independent chains that can be executed in parallel.

Edit: for this specific loop the only dependency is on the iteration variables whih is not an issue here, as the loop should only be limited by load/store bandwith, assuming proper scheduling and induction variable elimination from the compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: