Interesting position on testing: "I generally don't consider my own software adequately enough tested until its tests have turned up a bug in a compiler/toolchain. So far I've not encountered a compiler bug for libsecp256k1 [...] so I may have to settle for discovering that a ubiquitous system library couldn't square correctly."
(The bug was found via comparing the output of libsecp256k1 and OpenSSL on "specially-constructed random inputs.")
You might be intrested in this [1]. Its a 31C3 talk about exactly this. There idea is to write a runable spec (that only focues on correctness, instead of speed and things like that) that then can be used to test against real world application.
There idea is to do this against all the MAJOR interfaces that we all depend on TCP/IP, x86 memory model and stuff like that.
Runnable/verifiable specs are a good idea. A colleague of mine did some great work with compiling the H265 & VP9 specifications to produce a test suite, in a similar manner to fuzzing:
Why would that matter in the case of testing for correctness? I thought these 'runable spec' implementations were only to verify that the output of the 'real' implementation is correct and thus would never even be used anywhere that could/would be attacked. Am I not understanding the purpose correctly?
When testing for correctness why would you care about timing attacks? You would obviously have multiple tests covering correctness, performance, security, etc.
That's only because author uses a pretty standard compiler. Try building your flawless library on ARM, Newlib, etc. and you're pretty much fighting your tools all the time. The first compiler/toolchain bug is a rite of passage, yes, but not a measure of the code quality (although they do correlate).
It does say "a compiler/toolchain". Opus has both fixed point and floating point implementations. The fixed point one was extensively tested on ARM too, I'm sure.
Sorry. I knew I was mangling it, but it was the only way to get within HN's title restrictions. Thought about leaving a pre-emptive comment to that effect, and I probably should have.
> The incorrectly squared numbers would be expected to be found randomly with probability around one in 2^128
1:2^128 is an incredibly low probability. Universe is only around 2^86 nanoseconds old. For all practical purposes, an event with a 1:2^128 probability is impossible.
They say that they found it with randomized testing (although one that explores "a class of rare corner cases") and dismiss the claim that this is a class of bugs that can only be found by analysis of the implementation.
I think a test that manages to find a bug like this can not be called "random" (as in, throwing random inputs to a black box). Obviously I don't know the details, but I am sure their test incorporated a great deal of detailed knowledge of algorithms used in the computation.
> but I am sure their test incorporated a great deal of detailed knowledge of algorithms used in the computation
Actually no. In this case it exploits the knowledge that "low frequency inputs tend to explore more program space", It was found using numbers with a low transition probability between zero and one. This is an approach also used for digital circuit verification too.
It's true that I also know the code is full of hard to hit branches for carries, which does partially motivate that selection. But importantly, "We weren't even testing for that." the OpenSSL code in general has an entirely different algorithm than the libsecp256k1 code; and I hadn't recently looked at any OpenSSL innards.
He isn't using black box testing, he's using white box testing (knowing Greg, either KLEE [1] or AFL [2]... probably both... and probably also black box testing). The test does incorporate a great deal of knowledge of the algorithms (in fact, complete knowledge), but the point is that it's the machine that does the analysis of the implementation, not a human.
Though I use both klee and AFL, this particular instance was found with blackbox randomized tests; using numbers with low transition probabilities. E.g. 00000111111111000111000000000000.
Klee or AFL would be somewhat unlikely to find this particular problem because the error was an omitted branch, so they wouldn't know there was a case left to be satisfied.
Right, one of the weaknesses of tools that rely on coverage analysis... and highlights the importance of using more than one approach to testing. No single tool can do everything.
While I understand where your argument is coming from, it is important to ask the question: how many times a day is this piece of code executed on all computers worldwide?
Surely the chance of one person running into this bug is extremely small, but what about all the people on earth combined?
Nevertheless, I think we can classify this method of testing as some sort of hybrid between brute force testing and auditing; I don't think the authors are dismissing that claim this either, but are merely stating that there are ways to make very informed test cases without looking at the code.
Caveat that I might be completely off, but if, say, 10 bn people tried once a nanosecond since the inception of the universe, you'd still have only 10^10 * 2^86 ≈ 2^33 * 2^86 = 2^119 attempts, that still only cover's 2^-9 (i.e. 1/512th) of the total addressable 2^128 space, i.e. still fairly unlikely that you'll have hit that specific number.
Reasoning about large numbers is hard. 2^128 is considered enough security in any crypto scheme to be "impossible" to break. Consider that Bitcoin uses ECDSA which gives 128 bits of security. This is not brute forceable as there isn't enough energy around to actually flip the bits that many times, let alone do the calculations.
OB: pedantry, an attacker who does one test with 256-bit ECC has a probability of success of ~2^-256. 128-bit security comes from a rho attack which starts off unlikely (2^-256) to be successful and becomes sure to be successful at around 2^128 operations: e.g. it makes progress unlike normal "guess and check" brute force.
I just mention it because this fact means that the sufficiency of ECC security by itself doesn't mean that 2^-128 is "sufficient" against random chance.
But, course, 2^-128 is unfathomably low probability, and it's generally sufficient against 'chance'. Though chance is usually the wrong way to think about attack. For example: If I create software which takes a 256 bit input and does a "if (input == 8675309) exec_shell();" and expose it to the Internet what is the probability of that input? ... probably 1. :)
Really? You do realise that due to Bitcoin's popularity the software itself is extremely well tested and built, whilst being maintained by some very talented people.
(The bug was found via comparing the output of libsecp256k1 and OpenSSL on "specially-constructed random inputs.")