There's no reason to predict a branch if you're not going to execute speculative...

cwzwarich · on Jan 5, 2018

In an in-order CPU, you can still use a branch predictor to predict what to fetch and decode, so that you don't stall waiting for instruction fetch to finish after you resolve the branch.

In practice, advanced in-order designs contain more local reordering mechanisms, e.g. in the load/store unit, but they lack the unified global abstraction of a reorder buffer. The most successful timing attacks involve a mis-speculated load, so they wouldn't apply to these mechanisms, but it's not completely out of the question that they are also an effective side-channel.

bazizbaziz · on Jan 5, 2018

> There's no reason to predict a branch if you're not going to execute speculatively.

Not quite. Branch prediction is typically used on non-speculative architectures in order to avoid pipeline bubbles. (You could argue that pipelining is a form of speculation)

Here is the branch prediction documentation for one of the processors they claim is not vulnerable. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....

Whether or not they're vulnerable has more to do with how their pipeline is structured. It's possible for an architecture to be vulnerable if a request to the load store unit can be done within the window between post-branch instruction fetch/exec and a branch resolution. Eyeballing the pipeline diagram from the above docs, it looks like you can maybe get a request to the LSU off before the branch resolves. dramatic music

fulafel · on Jan 6, 2018

Pipelined processors would slowed down considerably without branch prediction: every branch (= loop iteration) would stall the pipeline and instruction prefetch. 20-25% of instructions are branches, so this would mean a 5-10 clock cycle pause every 4 or 5 instructions.

(Simplest cores have only static branch prediction though)

kevin_thibedeau · on Jan 6, 2018

Low end ARM parts keep track of branches to avoid purging cached pages that will be likely be needed soon. That allows them to have decent performance when executing from flash.