NIST may not have you in mind

blazingice · on Oct 21, 2012

This article focuses pretty heavily on the possibility of cache timing attacks against AES, and cites djb's original work along with Tromer/Osvik's publication in 2005.

Last week at CCSW, we published a paper[1] detailing our attempts to bring these attacks to bear against Chromium.

In short, we don't see AES cache timing attacks as possible on more recent processors, and especially so once you factor in the sheer size of modern architected code.

[1] http://cseweb.ucsd.edu/~kmowery/papers/aes-cache-timing.pdf

tptacek · on Oct 21, 2012

This is very cool, thanks for posting.

DJB's attacks were from a remote attacker's vantage point. But your paper also takes on Osvik and Tromer, who used "spy processes" to continuously probe local caches to create traces that could be analyzed for key information. I know your paper mentions branch prediction and says you don't have results for it, but what's your take on whether Aciicmez's BTB attack is going to remain viable?

I thought the BTB attack was the cleverest and most disquieting of the bunch, in that it suggested that we don't even know enough about the x86-64 microarchitecture to predict what side channel vulnerabilities we might have in software AES.

Regarding the paper itself: the most provocative claim it makes is that we're trending towards "complete mitigation" of cache side channel attacks. You give two reasons: AES-NI and multicore systems.

The AES-NI argument seems compelling but a little obvious, in the same sense as one could have argued that peripherals that offloaded AES would also blunt attacks against software AES. AES-NI blurs the line between software and hardware AES, but it's still a hardware implementation.

Another argumentative point that could be made here is, AES-NI mitigates cache-timing attacks against systems that use AES. It doesn't do much good if you can't use AES, since the most popular block ciphers that compete with AES are also implemented with table lookups.

I found the multicore argument a lot less compelling, since it relied in part on the notion that attackers wouldn't easily be able to predict the cache behavior of their target multicore systems. It seems to me that the most likely environment in which cache timing attacks are going to be a factor on the Internet is shared hosting environments, in which attackers with the sophistication to time AES are easily going to be able to get a bead on exactly what hardware and software they're aiming at. Most users of AES are also using off-the-shelf hardware and software.

blazingice · on Oct 21, 2012

Aciicmez's BTB attack looks at the branch predictor, and is potentially valid against any implementation which branches based on sensitive data. There's a whole class of these attacks which look at instruction paths, including a new one by Zhang et. al. against ElGamal at CCS this year, but they usually target asymmetric ciphers. In particular, since AES doesn't have key-dependent branching, these attacks don't apply.

I do agree with you that x86-64 is extremely complicated, and that new attacks might crop up due to some future optimization.

As for the paper:

Yeah, AES-NI is sort of the final hammer against AES cache timing attacks, since it doesn't use the cache at all, but I felt that a paper on AES cache timing would be remiss without mentioning it :)

There are two parts to the multicore argument: the first is that it complicates things massively, and the second is that it can be a complete mitigation if used properly.

First is the complication bit, and that's just saying that the attacker must understand almost everything about the multicore implementation, including multilevel cache behavior and (possibly non-deterministic?) replacement strategy. I'm willing to believe that, were this the only hurdle, a dedicated attacker could still succeed. I was looking at a single core machine, so I didn't have to deal with the complexity here.

For the complete mitigation, you need to rely on platform support for core pinning. If you're allowed to say "I want to do encryption now, give me my own core for 400ms", then, since the 4KiB T-tables fit into your core's L2, attacker threads on other cores just can't examine them during use. This complicates the VM hosting model and might be a decent DOS attack, but it does completely stop cache probing attacks.

Finally, as you said, my work can really only apply to AES on the x86 on the desktop. Change one of these variables (such as AES to ElGamal or RSA or Blowfish), and side channel attacks might still exist. Such is the problem with negative results :)

tptacek · on Oct 21, 2012

This was fun to read; thanks. It's interesting how side channel attacks can be both assisted and complicated by new hardware; usually, advances in hardware tend to favor attackers slightly more than defenders, but even just by pushing operations below attacker measurement thresholds --- without even trying, that is --- hardware makes some side channels very hard to exploit.

If you're an HN'er reading along at home, Aciicmez' BTB timing paper (you should just be able to Google that) is very very very cool. They not only realized that you could theoretically watch the caches used by the branch predictor to build a trace from which you could recover RSA keys, but also came up with a very simple way to profile those branch predictor caches; that is, they designed a "spy process" like Osvik and Tromer did for memory caches that targeted the BTB instead.

dfc · on Oct 21, 2012

I do not understand the jump from the NSA having a history of building systems from the chip up to reasoning by analogy that the same is true for NIST (The shared worldview link is 20 years old). I'm not disagreeing with the statement, I just do not see any support for the conclusion that NIST's is bad for the general public because unlike NIST's target customers we are not building custom chips.

Can anyone shed any light?

tptacek · on Oct 21, 2012

NIST doesn't build systems. It standardizes technology for the US. NIST standardizes far more than just crypto algorithms, but in the crypto cases, NSA reviews potential standards before the standard is published, for suitability to DoD. It is entirely reasonable to propose that NSA pushes NIST towards standardizing crypto that NSA is in a better position to use than industry is.

dfc · on Oct 21, 2012

I was hoping you would respond, thanks tptacek. In light of your comment, is it reasonable to assume that NSA is going to supply the custom chips to the rest of the federal government? Given federal procurement standards it seems that the majority of federal IT departments rely on industry to provide hardware. Is it still reasonable to propose that NSA pushes NIST in a direction that serves NSA's interest at the cost of weakening other governmental agencies? What is the implementation deadline for federal use of SHA-3? Is it unreasonable to assume that the standards committee expects SHA-3 hardware implementation similar to AES-NI?

On a related note AES is the NIST standard for protecting sensitive but unclassified information:

"Applicability. This standard may be used by Federal departments and agencies when an agency determines that sensitive (unclassified) information (as defined in P. L. 100-235) requires cryptographic protection.

Other FIPS-approved cryptographic algorithms may be used in addition to, or in lieu of, this standard. Federal agencies or departments that use cryptographic devices for protecting classified information can use those devices for protecting sensitive (unclassified) information in lieu of this standard." [1]

I have always assumed that this scope limitation within FIPS197 meant that NSA, DoD, Secret Army of Northen VA, etc had a different standard/requirements (NSA Suite A&B) for classified (and up) information. Is this the case? If so why would NSA have so much skin in the game if they were not restricted to FIPS197 requirements?

[1] http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

m0nastic · on Oct 21, 2012

I wouldn't expect the NSA to be involved with sourcing of hardware for the rest of the federal government (unless the continued fear around supply chain management completely takes over, which I suppose is possible, as it's probably the number one concern presently among a lot of agencies).

While they exert a disproportionate amount of guidance to other agencies, that is generally orthogonal to their primary motivations.

You can certainly make a case that NSA pushes NIST towards making guidelines for things that they favor, but I don't seriously believe that's at the expense of weakening other agencies (at least not as a goal).

As to classified and up information, the NSA can't get enough ECC, and many of the Suite B implementations of other standards are just a version of the standard that works (and is "certified") to use Elliptical Curve Cryptography (like TPM Suite B, which we work on).

dfc · on Oct 21, 2012

I apologize for not being clearer about NSA's motivations. I did not mean to imply that there was any malicious intent when it comes to weakening federal IT standards. (It seems that NSA would be aware of the negative side effects of their recommendations.)

Could NSA's hardware centric recommendations be motivated by an interest in leveraging economies of scale (due to the size of federal IT procurement) and purchasing COTS hardware that was optimized for AES?

m0nastic · on Oct 21, 2012

It's possible (although they're already procuring hardware at GSA-approved rates, so I'm not sure if there's the same economies of scale that you see in the commercial realm).

agl · on Oct 21, 2012

The link between NSA's past work and NIST's tendencies today is weak, certainly. It's at best a rough guess on my part although, with the NSA, a rough guess is as good as we ever get :)

But I am claiming that there's a hardware orientated bias in the major recommendations that I see NIST making in the crypto space. I'm taking a flying guess at the reason, and perhaps I should simply have skipped any speculation. But the seeming bias is still worth highlighting I think.

dguido · on Oct 21, 2012

I just checked and all of the computers and devices I own for work have AES hardware in them (Mac Mini, Macbook Air, iPhone). Maybe NIST thinks that, through standardization efforts, they can encourage more people to integrate such hardware over the long term?

The amount of hardware support that AES has already is pretty substantial: https://en.wikipedia.org/wiki/AES_instruction_set

I'd rather not suppose there's something insidious going on here, just that maybe NIST is taking a longer-term view than racing to put AES and SHA3 in everything yesterday.

pbsd · on Oct 21, 2012

Käsper and Schwabe's bitsliced AES [1] does not need very long streams to be fast. It processes 8 blocks simultaneously, not 128 (as a 'pure' bitsliced approach would), and therefore reaches peak performance at relatively small lengths, starting at 128 bytes.

[1] http://cryptojedi.org/papers/aesbs-20090616.pdf

agl · on Oct 21, 2012

Looks like I haven't been keeping up to date even with my own colleagues' work! Thanks, I'll update the post.

Tobu · on Oct 21, 2012

Hardware will evolve. CPU's design constraints — programs with low parallelism and not much awareness of the memory hierarchy — have caused a bottleneck. SHA-3 will end up as yet another specialty instruction, with the actual programming done by the hardware vendor. For people who don't want to be dependent on that, I imagine GPUs provide a faster and more flexible alternative.

planckscnst · on Oct 21, 2012

It seems strange that the author is complaining about AES speed. About a year ago, I benchmarked an IPsec setup between two cheap routers with an ARM9 processor that did not have any special crypto blocks in it. AES significanly outperformed the other algorithms I tried.

tptacek · on Oct 21, 2012

You were almost certainly benchmarking an implementation of AES that relies on table lookups for speed; those table lookups create a side channel vulnerability, which was much of the point of this article.

planckscnst · on Oct 21, 2012

Interesting; it was what's in the Linux kernel.

tptacek · on Oct 21, 2012

Pretty sure that's an S-box AES. The issue is that when implemented (a) in software (b) without reliance on large lookup tables --- ie, "securely" --- AES is significantly slower.

dchest · on Oct 21, 2012

http://bench.cr.yp.to/results-stream.html

tptacek · on Oct 21, 2012

That's a table of stream ciphers benchmarked against AES, a block cipher, in CTR mode.

dchest · on Oct 21, 2012

Yes. Apart from disk encryption, they are pretty much interchangeable (OP talked about IPSec).

tptacek · on Oct 21, 2012

AES is normally benchmarked against other block ciphers. It might not be fair to compare the performance of a native stream cipher to that of AES rigged up as a stream cipher with CTR.

floody-berry · on Oct 22, 2012

What makes the comparison unfair?

tptacek · on Oct 22, 2012

The other ciphers are natively stream ciphers, and AES is a block cipher adapted via CTR mode to be a stream cipher.

floody-berry · on Oct 22, 2012

I'm still missing what makes it unfair. Are other modes of AES potentially faster? Does AES provide something that stream ciphers don't? Are Salsa20/Chacha20 unable to be compared to anything because they're native stream ciphers that look like a block cipher in CTR mode under the hood?

tptacek · on Oct 24, 2012

1. Yes. 2. Yes. 3. No.

Nursie · on Oct 21, 2012

Intel have specific instructions for GCM that mitigate some of this stuff I'm sure. I know this doesn't translate to 'NIST are keeping software implementations in mind', but when these things are available on a few processors that does make the software guy's job easier.

el_cuadrado · on Oct 21, 2012

> try Salsa20 rather than AES

Salsa20 is a stream cipher, AES is a block cipher.

It is like saying 'Try GCC rather than Windows 8'.

tptacek · on Oct 21, 2012

That's more untrue that true. Most popular applications of AES could just as easily use CTR instead of CBC (I'm not particularly interested in litigating the conversion of applications that use ECB); many of those applications even have an option to do that. Most settings in which AES-CTR is acceptable admit to any other stream cipher.

Even most message-based crypto --- applications where the designers can predict the size of most encrypted messages --- are well served by CTR, or by a native stream cipher.

So, while there are fundamental differences between Salsa20 and AES, for bulk encryption purposes those differences come nowhere close to the difference between GCC and Windows 8. It's really more of a "GCC vs. LLVM" kind of difference.

dchest · on Oct 21, 2012

In addition to what others said, there's even fewer differences between them internally.

Salsa20 <stream cipher> is not a "traditional" stream cipher -- basically it's Salsa20 <hash function> (not collision resistant) in counter mode. The hash function itself is implemented as permutation with final addition of input words to make it irreversible. With this permutation you can build a block cipher, and by the way that's what SHA-3 finalist BLAKE does by introducing addition of key words during rounds (except it's built on a variant of Salsa20 called ChaCha20).

(I'm not saying that you should build your own block cipher from Salsa20, of course :)

The original sentence, though, is:

try Salsa20 rather than AES and Poly1305 rather than GCM.

AES-GCM is also not a "block cipher".

tptacek · on Oct 21, 2012

For those playing along at home, AES-GCM is AES in CTR mode plus a variant of Wegman-Carter MAC function to authenticate the data.

quotemstr · on Oct 21, 2012

CTR mode turns block ciphers into stream ciphers, and the best way to use AES is in CTR mode.

askimto · on Oct 21, 2012

That's a stretch.