Why Bitcoin Core 0.10's release notes say “…libsecp256k1 is better than…OpenSSL”

mjs · on Jan 9, 2015

Interesting position on testing: "I generally don't consider my own software adequately enough tested until its tests have turned up a bug in a compiler/toolchain. So far I've not encountered a compiler bug for libsecp256k1 [...] so I may have to settle for discovering that a ubiquitous system library couldn't square correctly."

(The bug was found via comparing the output of libsecp256k1 and OpenSSL on "specially-constructed random inputs.")

derf_ · on Jan 9, 2015

For those not familiar with Greg's work, this presentation on the testing he did for the Opus codec is informative: https://www.ietf.org/proceedings/82/slides/codec-4.pdf

Particularly slide 20.

illumen · on Jan 9, 2015

Comparing multiple implementations is a great way to find bugs. Even running the same code with different compilers or platforms helps find bugs.

It doesn't find cases where they are both wrong, but often finds cases where one of them is wrong.

It is a very common testing strategy with maths libraries.

nickik · on Jan 9, 2015

You might be intrested in this [1]. Its a 31C3 talk about exactly this. There idea is to write a runable spec (that only focues on correctness, instead of speed and things like that) that then can be used to test against real world application.

There idea is to do this against all the MAJOR interfaces that we all depend on TCP/IP, x86 memory model and stuff like that.

[1] http://events.ccc.de/congress/2014/Fahrplan/events/6574.html (the talk is on youtube or on ccc video streaming side)

pjc50 · on Jan 9, 2015

Runnable/verifiable specs are a good idea. A colleague of mine did some great work with compiling the H265 & VP9 specifications to produce a test suite, in a similar manner to fuzzing:

http://www.argondesign.com/products/2014/may/27/streamsvp9/

edwintorok · on Jan 9, 2015

It would help if the videos were automatically linked from the event page:

http://media.ccc.de/browse/congress/2014/31c3_-_6574_-_en_-_...

Anderkent · on Jan 9, 2015

The talk on youtube: https://www.youtube.com/watch?v=ca0DWaV9uNc (I couldn't find a link on that page)

aw3c2 · on Jan 9, 2015

That 'CCCen' account is an imposter and evil. Please do not link to it or the companion 'CCCdeVideos'. Some background on this issue: https://events.ccc.de/2015/01/03/the-youtube-and-stream-dump...

Here is the official CCC channel: https://www.youtube.com/user/mediacccde

The official recording: http://media.ccc.de/browse/congress/2014/31c3_-_6574_-_en_-_...

The official recording on their official YT channel: https://www.youtube.com/watch?v=MBIHPLFmcgA

pilsetnieks · on Jan 9, 2015

> instead of speed

Not when developing crypto. You have to look at execution speed, too, to prevent timing attacks.

raverbashing · on Jan 9, 2015

Execution speed variance

It doesn't matter if it takes 1s or 10s, but that it takes 10s every time (to verify a password, for example)

azdle · on Jan 9, 2015

Why would that matter in the case of testing for correctness? I thought these 'runable spec' implementations were only to verify that the output of the 'real' implementation is correct and thus would never even be used anywhere that could/would be attacked. Am I not understanding the purpose correctly?

Igglyboo · on Jan 9, 2015

When testing for correctness why would you care about timing attacks? You would obviously have multiple tests covering correctness, performance, security, etc.

kosma · on Jan 9, 2015

That's only because author uses a pretty standard compiler. Try building your flawless library on ARM, Newlib, etc. and you're pretty much fighting your tools all the time. The first compiler/toolchain bug is a rite of passage, yes, but not a measure of the code quality (although they do correlate).

gcp · on Jan 9, 2015

It does say "a compiler/toolchain". Opus has both fixed point and floating point implementations. The fixed point one was extensively tested on ARM too, I'm sure.

lultimouomo · on Jan 9, 2015

The quoting is quite misguiding, the article and its title say:

> we have reason to believe that libsecp256k1 is better tested and more thoroughly reviewed than the implementation in OpenSSL

which is something very different from generically being "better" (which has 0 information value, BTW).

nullc · on Jan 9, 2015

Thanks, yes. I spoke intentionally.

bdr · on Jan 9, 2015

Sorry. I knew I was mangling it, but it was the only way to get within HN's title restrictions. Thought about leaving a pre-emptive comment to that effect, and I probably should have.

avian · on Jan 9, 2015

> The incorrectly squared numbers would be expected to be found randomly with probability around one in 2^128

1:2^128 is an incredibly low probability. Universe is only around 2^86 nanoseconds old. For all practical purposes, an event with a 1:2^128 probability is impossible.

They say that they found it with randomized testing (although one that explores "a class of rare corner cases") and dismiss the claim that this is a class of bugs that can only be found by analysis of the implementation.

I think a test that manages to find a bug like this can not be called "random" (as in, throwing random inputs to a black box). Obviously I don't know the details, but I am sure their test incorporated a great deal of detailed knowledge of algorithms used in the computation.

nullc · on Jan 9, 2015

> but I am sure their test incorporated a great deal of detailed knowledge of algorithms used in the computation

Actually no. In this case it exploits the knowledge that "low frequency inputs tend to explore more program space", It was found using numbers with a low transition probability between zero and one. This is an approach also used for digital circuit verification too.

It's true that I also know the code is full of hard to hit branches for carries, which does partially motivate that selection. But importantly, "We weren't even testing for that." the OpenSSL code in general has an entirely different algorithm than the libsecp256k1 code; and I hadn't recently looked at any OpenSSL innards.

derf_ · on Jan 9, 2015

He isn't using black box testing, he's using white box testing (knowing Greg, either KLEE [1] or AFL [2]... probably both... and probably also black box testing). The test does incorporate a great deal of knowledge of the algorithms (in fact, complete knowledge), but the point is that it's the machine that does the analysis of the implementation, not a human.

[1] https://klee.github.io/

[2] http://lcamtuf.coredump.cx/afl/

nullc · on Jan 9, 2015

Though I use both klee and AFL, this particular instance was found with blackbox randomized tests; using numbers with low transition probabilities. E.g. 00000111111111000111000000000000.

Klee or AFL would be somewhat unlikely to find this particular problem because the error was an omitted branch, so they wouldn't know there was a case left to be satisfied.

derf_ · on Jan 9, 2015

Right, one of the weaknesses of tools that rely on coverage analysis... and highlights the importance of using more than one approach to testing. No single tool can do everything.

stingraycharles · on Jan 9, 2015

While I understand where your argument is coming from, it is important to ask the question: how many times a day is this piece of code executed on all computers worldwide?

Surely the chance of one person running into this bug is extremely small, but what about all the people on earth combined?

Nevertheless, I think we can classify this method of testing as some sort of hybrid between brute force testing and auditing; I don't think the authors are dismissing that claim this either, but are merely stating that there are ways to make very informed test cases without looking at the code.

marcusf · on Jan 9, 2015

Caveat that I might be completely off, but if, say, 10 bn people tried once a nanosecond since the inception of the universe, you'd still have only 10^10 * 2^86 ≈ 2^33 * 2^86 = 2^119 attempts, that still only cover's 2^-9 (i.e. 1/512th) of the total addressable 2^128 space, i.e. still fairly unlikely that you'll have hit that specific number.

mike_hearn · on Jan 9, 2015

Reasoning about large numbers is hard. 2^128 is considered enough security in any crypto scheme to be "impossible" to break. Consider that Bitcoin uses ECDSA which gives 128 bits of security. This is not brute forceable as there isn't enough energy around to actually flip the bits that many times, let alone do the calculations.

nullc · on Jan 9, 2015

OB: pedantry, an attacker who does one test with 256-bit ECC has a probability of success of ~2^-256. 128-bit security comes from a rho attack which starts off unlikely (2^-256) to be successful and becomes sure to be successful at around 2^128 operations: e.g. it makes progress unlike normal "guess and check" brute force.

I just mention it because this fact means that the sufficiency of ECC security by itself doesn't mean that 2^-128 is "sufficient" against random chance.

But, course, 2^-128 is unfathomably low probability, and it's generally sufficient against 'chance'. Though chance is usually the wrong way to think about attack. For example: If I create software which takes a 256 bit input and does a "if (input == 8675309) exec_shell();" and expose it to the Internet what is the probability of that input? ... probably 1. :)

on Jan 9, 2015

[deleted]

josephagoss · on Jan 9, 2015

Really? You do realise that due to Bitcoin's popularity the software itself is extremely well tested and built, whilst being maintained by some very talented people.

java-man · on Jan 9, 2015

no software is truly safe until the amount of testing code exceeds the amount of main code...

bmm6o · on Jan 9, 2015

You should make it clear if you think it's necessary but not sufficient.

comrade1 · on Jan 9, 2015

I don't know man... I can write some pretty lame unit tests.

tribaal · on Jan 9, 2015

Not to mention, massive amounts of lame unit tests exercising a tiny fraction of the code base :p

Of course, "java-man" is probably just trolling.

java-man · on Jan 9, 2015

why all the downvoting? have you people been writing reliable software with little test code? is YOUR code reliable? how do you know that?

of course, no amount of testing can fix a bad design. but still...

jerguismi · on Jan 9, 2015

The nickname exposes the trolling.

tribaal · on Jan 9, 2015

Indeed :)