If you believe a proof of correctness produced by a formal verification tool, I have a bridge to sell you.
For a proof of correctness, you need to model the environment, and write a specification or a golden model. These are just as prone to bugs as the system you are trying to verify (even assuming the tools are correct).
That is not to say formal verification is useless. Developing a proof will force you to formalize your intuitive grasp of the system and its interfaces. Counterexamples produced by the tools will often help to find deep bugs that would be almost impossible to find any other way. And after many iterations of debugging both the system and the proof, you might gain a lot of confidence in the proof and the system.
But even after all of that, trusting a formal proof as a guarantee of correctness is insane.
"For a proof of correctness, you need to model the environment, and write a specification or a golden model. These are just as prone to bugs as the system you are trying to verify (even assuming the tools are correct)."
I disagree with the idea that a spec is as buggy as the implementation. Some examples.
In hardware verification, a spec for a floating-point operation is typically a page or two long. The corresponding RTL implementations can be tens of thousands of lines, with many additional concerns (pipelining, clock gating, ...) that don't show up in the spec, and where the operation itself is fused together with dozens of other operations. The difference in complexity is striking.
In simple software cases, say a sorting function, the spec can be as straightforward as "the answer is ordered and has the same elements as the input array." This same spec can be used with any number of implementations, including things like Timsort where there are multiple modes and optimizations that are interacting with one another. The spec says nothing about runs, galloping, etc., which the implementation uses.
For more complex software, the spec can of course become complex. Even so, there are examples, e.g., the Jitawa Lisp runtime (https://www.cl.cam.ac.uk/~mom22/jitawa/), where the spec is a relatively straightforward description of a programming language that is much, much simpler than the implementation. For instance, the spec here does not mention things like garbage collection and just-in-time compilation, which are fairly complex features of the implementation.
Anyway, trusting a formal proof seems, to me, certainly no less sane than trusting a system without such a proof.
Here's a spec which was buggy and yet widely accepted and implemented: TLS 1.0
You could formally prove an implementation of it and had a dangerously buggy end product. The same could be said for many of the early encryption specs which have proven to be vulnerable against timing attacks, side channel attacks, and so forth.
I've helped QA quite a few financial specs (namely those created to transform credit reports into decisions) which were inherently buggy, and these were written as almost pure mathematical formulas that just didn't take corner cases into account.
A formal proof has value, but it's certainly not an absolute - relying on it blindly to indicate the correctness of a program is putting yourself at risk.
EDIT: As a completely separate point, let's look at the potential problems with this sorting spec: "the answer is ordered and has the same elements as the input array."
What does ordered mean? How do I compare a number and a string? A binary blob? A structure? Is there a restriction on the word "same" - do they have to be the existing object, or can it be a copy? Should the answer be in the same memory space as the original, or should it be a distinct memory object? Can the original be modified?
> A formal proof has value, but it's certainly not an absolute - relying on it blindly to indicate the correctness of a program is putting yourself at risk.
Indeed, I agree, but lots of people never ask themselves how much more at risk they are putting themselves when their implementation has no formal spec or proof at all!
Like another commenter has argued, it seems that people seem to look at formal verification like it's an all-or-nothing approach (either it achieves zero bugs or it's completely worthless, or alternatively -- if it's formally verified then it MUST be bug free!), but it's really not, it simply gives higher (often, much much higher) assurances.
Verification, showing that the spec and the impl agree. This is usually done with comprehensive tests if it's done at all, and very rarely with formal methods (since formal specs are so rare). TLS 1.0 is not a formal spec.
Concurrently, you have validation. This is arguably far far more important, since it's cheaper to fix this before you go spend money on verification. Validation is showing that the spec, IS THE RIGHT SPEC! This is where you would have put almost all your cryptographers when developing TLS 1.0, and there's work on doing this with TLS 1.3 (eg, https://www.ethz.ch/content/dam/ethz/special-interest/infk/i...).
Once you have a formal spec, written in an automated, machine-checked system, you can start analyzing it with formal methods. This is super super powerful. The seL4 specification, even at its most abstract, is rather subtle and quite long. It's hard to reason with it. But since they already had a formal spec, they could analyze that spec at its highest level of abstraction to prove valuable security properties: confidentiality and integrity. Now, you don't have to look at the whole spec to validate it for your security requirements. You just have to look at the statements of confidentiality and integrity and validate those.
This is standard high-assurance software development, and it's not taught or practice very often.
>What does ordered mean? How do I compare a number and a string? A binary blob? A structure? Is there a restriction on the word "same" - do they have to be the existing object, or can it be a copy? Should the answer be in the same memory space as the original, or should it be a distinct memory object? Can the original be modified?
this is completely uninteresting in any formal spec system, because it fixes those details usually implicitly. you're usually comparing things of the same type, or using an explicit comparison operator that has stated properties (which become proof obligations).
To that, I would argue that security is (or should be) part of the spec; if it’s not secure, it’s not correct. Having it be a separate and lesser concern is how we end up tacking on security features as an afterthought.
To borrow from the kernel devs: Security requirements are just like any other requirements.
>I disagree with the idea that a spec is as buggy as the implementation. Some examples.
It isn't necessarily as buggy as implementation, but I've worked on plenty of (non-formally verified) projects where the majority of bugs were spec bugs. If you're diligent about following the UNIX philosophy of assembling small self-contained, tightly specified (and test driven) components, this is pretty normal I think. Building the wrong thing is often the biggest problem.
The one time I did do formal verification (at uni) that was certainly the case too, though that may partly have been because the language was an overcomplicated pile of crap (Z).
One of my co-workers also told me a story about a formally verified program being translated into python and having a couple of bugs picked up on the way. I'm pretty credulous of this.
Formal verification techniques can be very useful, especially for finding deep, often insanely complex bugs.
The process of developing the proof can produce a lot of insight about the system and expose hidden assumptions that were never considered while designing the system.
It is the final proof itself which is the problem. It does not contribute anything except for terminating the debugging process.
Because the specification can be just complex as the system, the proof can only provide a false sense of security. A correctness theatre.
> It is the final proof itself which is the problem. It does not contribute anything except for terminating the debugging process.
That's true in a certain restricted sense, if you see the verified program as a thing set in stone and never touched again. But verified software is maintained, changed and extended just like other software. Whenever you change a program and your proofs still work, that tells you something. If the proofs break, that also tells you something. If all you need to change are some small lemmas but the overall end-to-end proof goes through, you know exactly what interactions with other parts of the system were affected by your change.
Because the specification can be just complex as the system
That's true in principle, but in practice I don't think it's going to hold in most cases. A good specification necessarily captures the essential complexity of the system, but it's a lucky developer whose implementation doesn't introduce considerable accidental complexity as well. Ensuring that this accidental complexity doesn't break anything could have great value, depending on the nature of the system and how damaging a failure could be.
This is what many people are missing in their understanding of formal verification. Things that reduce bugs are not proofs, but a different thought process that sort of double checks everything from a different perspective.
> Because the specification can be just complex as the system
Can you give an example of this? I don't find this true in general at all e.g. writing the specification of a sorting algorithm is much easier than implementing most practical sorting algorithms.
> the proof can only provide a false sense of security. A correctness theatre.
You could use this argument to say that unit tests, type systems, code review etc. are correctness theatre. All these things prevent mistakes.
It's pretty clear that a fully formal specification of a program must be (roughly) as complex as the program itself, for otherwise you could construct an infinitely descending chain of ever simpler specifications which leads to a contradiction.
The reason that you found the
specification of a sorting algorithm simpler than the sorting algorithm is that you probably have not fully specified the algorithm. Consider the sorting algorithm in C's standard library, which allows the user to specify a comparison function. Now imagine you'd pass something like the following comparison function:
def compare (x, y):
printf ( "I'm comparing %i with %i\n", x, y )
if ( x < y ) return 1
if ( x == y ) return 0
else return -1
It almost completely exposes the inner workings of sorting. Hence a full spec needs to account for this.
> It's pretty clear that a fully formal specification of a program must be (roughly) as complex as the program itself, for otherwise you could construct an infinitely descending chain of ever simpler specifications which leads to a contradiction.
That's like saying a theorem statement must be roughly as complex as the proof.
Also consider that it's much harder to reason (and prove) that heavily optimised versions of algorithms are correct compared to naive implementations. For example, a proof of correctness for heapsort is more involving than one for insertion sort.
Trusting any sufficiently complex system to be correct is insane. It’s virtually impossible to do and be confident at all levels your proof is correct once you move past much more than trivial systems.
This is why modern software testing takes a more... I guess you could call it Bayesian... approach.
QA, unit testing, integration testing, etc - these are all ways of reducing probabilities of incorrectness. Never formally prove it of course, but make your confidence higher with each.
> QA, unit testing, integration testing, etc - these are all ways of reducing probabilities of incorrectness. Never formally prove it of course, but make your confidence higher with each.
And formal proofs don't make your confidence higher? IME they're the most efficient tool in the toolbox if you need to get to really low defect rates. (Of course, the sad economic reality is that for most consumer software even a 5% defect rate is perfectly acceptable).
It's not that simple. Defects don't necessarily have to cause problems, depending on how you design your software (think OTP-style architecture). And avoiding problems is in fact what you always need, not low defect rates. It's just so happens that sometimes you can only achieve decent reliability by pursuing very low defect rates. But for a lot of software it cannot economically beat other approaches.
Fred Brooks said it years ago: "Even perfect program verification can only establish that a program meets its specification. … Much of the essence of building a program is in fact the debugging of the specification."
As far as I can tell, the formal proofs themselves were always correct in his example. The problem was that the proved statements were not the ones intended.
If you believe a proof of correctness produced by a formal verification tool, I have a bridge to sell you.
For a proof of correctness, you need to model the environment, and write a specification or a golden model. These are just as prone to bugs as the system you are trying to verify (even assuming the tools are correct).
That is not to say formal verification is useless. Developing a proof will force you to formalize your intuitive grasp of the system and its interfaces. Counterexamples produced by the tools will often help to find deep bugs that would be almost impossible to find any other way. And after many iterations of debugging both the system and the proof, you might gain a lot of confidence in the proof and the system.
But even after all of that, trusting a formal proof as a guarantee of correctness is insane.