Hacker News new | past | comments | ask | show | jobs | submit login

Modern CPUs also depend on coherent branching behavior, and conditions like this throw it off. I bet most of the 15X slowdown that the author is experiencing is not from the extra conditions, but from thrashing in the BPU (branch prediction unit)!

You might be underestimating the efficiency of branch prediction on modern CPUs. It's getting close to the point where if you (the human) can easily predict the pattern based on past history, the CPU will perfectly predict it as well. Manufacturers are usually a little cagey about the exact specifications, so one is often limited to empirical testing, but a simple pattern like one in the example is going to be predicted perfectly after the first couple iterations of training. Here's what one authority who's done lots of testing has to say about Haswell, the CPU the post is concerned with:

  The Haswell is able to predict very long repetitive jump  
  patterns with few or no mispredictions. I found no specific 
  limit to the length of jump patterns that could be 
  predicted. Loops are successfully predicted up to a count
  of 32 or a little more. Nested loops and branches inside 
  loops are predicted reasonably well.
http://www.agner.org/optimize/microarchitecture.pdf



Looks like you guys are right, though I do have to execute the loop multiple times to find the same performance result. Weird.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: