There's no reason to predict a branch if you're not going to execute speculatively.
I need to re-read the papers but I think the real problem isn't even speculative execution but allowing speculative cache changes.
The notion that "gadgets" didn't even need to return properly was both amusing and eye opening for me. It doesn't matter because the result will be flushed anyway! ;-)
In an in-order CPU, you can still use a branch predictor to predict what to fetch and decode, so that you don't stall waiting for instruction fetch to finish after you resolve the branch.
In practice, advanced in-order designs contain more local reordering mechanisms, e.g. in the load/store unit, but they lack the unified global abstraction of a reorder buffer. The most successful timing attacks involve a mis-speculated load, so they wouldn't apply to these mechanisms, but it's not completely out of the question that they are also an effective side-channel.
> There's no reason to predict a branch if you're not going to execute speculatively.
Not quite. Branch prediction is typically used on non-speculative architectures in order to avoid pipeline bubbles. (You could argue that pipelining is a form of speculation)
Whether or not they're vulnerable has more to do with how their pipeline is structured. It's possible for an architecture to be vulnerable if a request to the load store unit can be done within the window between post-branch instruction fetch/exec and a branch resolution. Eyeballing the pipeline diagram from the above docs, it looks like you can maybe get a request to the LSU off before the branch resolves. dramatic music
Pipelined processors would slowed down considerably without branch prediction: every branch (= loop iteration) would stall the pipeline and instruction prefetch. 20-25% of instructions are branches, so this would mean a 5-10 clock cycle pause every 4 or 5 instructions.
(Simplest cores have only static branch prediction though)
Low end ARM parts keep track of branches to avoid purging cached pages that will be likely be needed soon. That allows them to have decent performance when executing from flash.
I need to re-read the papers but I think the real problem isn't even speculative execution but allowing speculative cache changes.
The notion that "gadgets" didn't even need to return properly was both amusing and eye opening for me. It doesn't matter because the result will be flushed anyway! ;-)