This attack is conceptually simple, and the linked paper (http://cr.yp.to/antifo...

tptacek · on Nov 12, 2014

Unsurprisingly excellent description. How much control do we have over the cache lines populated by AES in a TLS scenario? We can't fully randomize the AES inputs, because they have to correspond to the TLS protocol.

sdevlin · on Nov 12, 2014

I think it would be tough to attack TLS with this.

A couple important stumbling blocks off the top of my head:

We need to pull off the attack in the context of a single TLS session, because new AES keys are generated for each one. This means we can't choose any inputs to the AES function. If we try to manipulate or forge a message, the MAC will fail and then the jig is up. This might not be a big deal, since the AES ciphertexts will seem random anyway.

There will be a large amount of overhead in each request compared to Bernstein's original experiment. This will likely introduce a large amount of per-request jitter. With enough requests we can iron this out, but again: one session. So there may be an upper limit on the number of samples we can realistically take.

We may have a tough time comparing like to like. Certainly we can request the same static resource each time if we control malicious JavaScript, but there may be confounding factors. This might not be a big deal, since I think this just amounts to more jitter.

xiata · on Nov 12, 2014

I know this is naive as hell, but would adding some random sleeps before returning encrypted bits make it better or worse for this particular attack?

AlyssaRowan · on Nov 12, 2014

That kind of "blinding" is entirely ineffective, I'm afraid. It just averages right out.

AnimalMuppet · on Nov 13, 2014

If "sleep" means "sleep until the next timer tick", how can it average out? Especially if the timer is started at the start of the encryption, and all encryptions (at least, for one block) take less time than the timer is set for. That means all times get set to exactly the timer time, and no information reaches the attacker.

What am I missing?

bmm6o · on Nov 13, 2014

"Random sleep" does not in general mean "sleep until the next timer tick". The best fix is making the function constant time, if you can achieve this with a sleep that makes the operation always take exactly one quantum then the sleep is really an implementation detail and quite far from "random sleep".

e12e · on Nov 13, 2014

I've realized this (and am certainly not alone in that, it's rather obvious, when one think about the nature of timing) -- but does anyone have some links on implementing this? Are there some (sequence of) x86_64 instructions that can be used to bound a procedure (in the pascal sense of the word) to a quantum, regardless of things like when the procedure is called (assuming the procedure is short enough, and depending on branching behaviour, I suppose instruction fetch/decode, data fetch/write and accompanying cache hit/miss can make it hard to a) select a worst-run time in terms of clocks cycles to target (and if so, for which concrete cpus) and b) be hard to make sure the cpu is actually busy for exactly that many cycles...)? Is this even possible to approach in this portable C99?

I suppose if one ignore the information leak due to possible change in cpu load, one might device a kind of evented "call back" model, where one wait to return the result of a procedure until an interrupt is triggered?

I don't expect a full answer, but if anyone has a link to some source code that isn't too complicated, I'd be very happy (either "real-world" or some good "example" code).

bmm6o · on Nov 13, 2014

The most common way to avoid timing side channel attacks is to write the procedure in C or ASM in such a way that there are no data-dependent differences in execution path. You've probably seen e.g. the memcmp that doesn't exit early. This attack is a little different in that it's not that different instructions are executed, it's that different memory access patterns take different amounts of time. For that, you can maybe change the implementation to not have any data-dependent array accesses, or maybe you can do things with prefetching to make the memory accesses constant time.

An approach where you watch the clock will be inherently less portable and actually much harder. Not only will the timing calls be hardware or OS specific, but so will the worst-case time. Imagine having to deal with a chip going into low power mode during your computation. Also you probably don't want to count time that your thread wasn't scheduled to run, so now you're talking about integrating with the scheduler.

AnimalMuppet · on Nov 13, 2014

Ah, I missed that xiata said "random". My error.

Natsu · on Nov 13, 2014

You always want to remove secret data, not obfuscate it. Think of the difference between those reversible pixel mosaics and a flat black censor bar, for example.

Incidentally, many 'normal' blur functions implemented in software are somewhat reversible.

TheLoneWolfling · on Nov 12, 2014

It would mitigate it slightly, but ultimately will be ineffectual - they can simple take more measurements.