I almost completely disagree with the author of this article. Yes, formally veri...

mafribe · on Dec 24, 2017

Coq has/is much stronger a logic (albeit constructive) than HOL. That is sometimes convenient. Most users of proof assistants don't get far enough that powerful automation is unavoidable.

In addition, Coq benefits for a network effect: Coq started earlier targetting university teaching, primarily at UPenn with Pierce's Software Foundations [1].

[1] https://softwarefoundations.cis.upenn.edu

wizeman · on Dec 24, 2017

That Coq has a stronger logic is an argument I hear often, but is that really so important when the result is that it's so hard to use that no real-world software engineer is ever going to bother?

Maybe with HOL I will not be able to prove some deep theoretical mathematical result that I could with Coq's Calculus of Constructions, but if I can otherwise much more easily prove >99% of my code to be bug-free, then I would argue that perhaps it's the better "tool" for the job...

mafribe · on Dec 24, 2017

The issue is not the edge of logical independence, but rather how easily can you express certain theorems. Most verification is about the verification of programming languages (e.g. the POPLmark Challenge [1]), and in this setting sometimes what you want to prove is more convenient to express in the CoC. You can also express it in HOL, but not quite as easily.

All that said, automation of provers based on the LCF approach with a non-constructive logic is much superior as of December 2017. With this in mind I was surprised to see that L. de Moura based the new Lean prover [2] on a variant of the CoC. He thinks that all automation that we currently have for HOL (i.e. Isabelle/HOL) can be brought to Lean.

J. Blanchette [3] seems to think the same.

[1] https://www.seas.upenn.edu/~plclub/poplmark

[2] https://leanprover.github.io

[3] https://people.mpi-inf.mpg.de/~jblanche

wizeman · on Dec 24, 2017

> sometimes what you want to prove is more convenient to express in the CoC. You can also express it in HOL, but not quite as easily.

> All that said, automation of provers based on the LCF approach with a non-constructive logic is much superior as of December 2017.

This is kind of the point I am making. Is the fact that some things are more convenient to express worth the substantial loss of automation?

> With this in mind I was surprised to see that L. de Moura based the new Lean prover [2] on a variant of the CoC. He thinks that all automation that we currently have for HOL (i.e. Isabelle/HOL) can be brought to Lean.

> J. Blanchette [3] seems to think the same.

Interesting, thanks! I hope they are right and in the future this kind of automation does come to Coq and Lean.

seanwilson · on Dec 24, 2017

> Maybe with HOL I will not be able to prove some deep theoretical mathematical result that I could with Coq's Calculus of Constructions, but if I can otherwise much more easily prove >99% of my code to be bug-free, then I would argue that perhaps it's the better "tool" for the job...

Is this true though? What program properties can Isabelle easily automate the proofs of that Coq cannot?

wizeman · on Dec 24, 2017

> Is this true though? What program properties can Isabelle easily automate the proofs of that Coq cannot?

Almost all?

Granted, I haven't used Coq very much, so correct me if I'm wrong, but isn't it true that in Coq, with the exception of basic proofs like simple arithmetic expressions, in order to prove a theorem you have to at least explicitly know and write the name of the lemmas that are needed to prove it (apart from how they fit together)?

In Isabelle/HOL, many times you literally only need to write "try" and Isabelle will find the proof of the theorem for you, including the lemmas that were needed to prove it and how they fit together (assuming these lemmas were already proven before). If it turns out that your theorem is false, it can show you an example of inputs where it fails.

This auto-proof functionality I'm describing is mostly thanks to Sledgehammer [0], the auto-counterexample finding is due to Nitpick (and another tool which I can't remember), although there are a few other automation tools that are also very useful.

Granted, you still have to have a high-level understanding of how you want to prove your code, but I would argue this eliminates a lot of the work and (wasted?) time you need to accomplish the final goal...

[0] http://isabelle.in.tum.de/website-Isabelle2009-1/sledgehamme...

mcguire · on Dec 24, 2017

"Granted, I haven't used Coq very much, so correct me if I'm wrong, but isn't it true that in Coq, with the exception of basic proofs like simple arithmetic expressions, in order to prove a theorem you have to at least explicitly know and write the name of the lemmas that are needed to prove it (apart from how they fit together)?"

Coq has automated tactics like "auto" that search for proofs based on a configurable database of "hints". On the other hand, if it doesn't succeed, it leaves the proof state unchanged and gives no hints as to what to do next. But on the other, other hand, if it succeeds, it gives no hints as to how it succeeds. So it's less useful than it might be.

Edit: Also, Coq has a dandy baked-in search system for finding lemmas and so-forth that would be useful in a given situation. On the other hand, for a similar flavor, look on Hoogle for functions transforming one list into another list.

I need to play with Isabelle.

seanwilson · on Dec 24, 2017

Do you have any example proof scripts of some basic algorithms where the proofs can be almost fully automated in Isabelle?

wizeman · on Dec 24, 2017

See [1] for an example (edit: although this may not be the best example nowadays, see below):

Quoting:

> Consider the conjecture "length(tl xs) <= length xs", which states that the length of a list’s tail (its "cdr") is less than or equal to the length of the entire list.

> Thanks to Vampire, Sledgehammer finds the following proof: by (metis append_Nil2 append_eq_conv_conj drop_eq_Nil drop_tl tl.simps(1))

Those are all the lemmas that were needed to prove the theorem. They were selected completely automatically.

AFAIU the "metis" procedure tries to fit the lemmas together so that they prove the theorem, and I believe the HOL kernel gives high assurance that if "metis" succeeds, then the theorem was proven correctly (but my understanding may be wrong here).

Note that Sledgehammer may even automatically find and return a human-readable proof, which for the example above, it returns something like this:

  proof –
    have “tl [] = []” by (metis tl.simps(1))
    hence “∃u. xs @ u = xs ∧ tl u = []” by (metis append_Nil2)
    hence “tl (drop (length xs) xs) = []” by (metis append_eq_conv_conj)
    hence “drop (length xs) (tl xs) = []” by (metis drop_tl)
    thus “length (tl xs) <= length xs” by (metis drop_eq_Nil)
  qed

Edit: That said, note that at least nowadays Isabelle/HOL seems to be able prove the above theorem by simplification alone ("by simp"), so this may not be the best example.

[1] http://people.irisa.fr/Thomas.Genet/ACF/BiblioIsabelle/isabe...

cmrx64 · on Dec 25, 2017

metis is a first-order automated theorem prover. the metis tactic will handle encoding the goal as a first-order problem, using ~just the rules you provide it, invoking actual-metis, and reconstructing a proof (in the kernel logic) from the output proof. so, it is impossible to prove things that are false using the metis tactic (ie, metis is entirely out of the TCB).

re: the simplifier, I feel like it's a bad example because coming up with good simp rules and knowing when to add them and when not to when writing new code in Isabelle/HOL is almost an art.

mafribe · on Dec 24, 2017

One example are nominal techniques [1]. While they are constructive, they have so far only found substantial implementation in Isabelle/HOL [2]. The reason is that the natural implementation is non-constructive, and leads to only a small blowup in proof complexity, which can easily be handled by Isabelle/HOL's proof automation. In contrast, constructive implementations of nominal techniques seem to require a considerable blowup in logical complexity, which the existing proof automation in Curry-Howard based provers doesn't seem to handle gracefully.

Another issue is that most of the automation that is known as "hammer" is based on non-constructive automatic provers.

[1] https://www.cl.cam.ac.uk/~amp12/papers/nomlfo/nomlfo-draft.p...

[2] https://nms.kcl.ac.uk/christian.urban/Nominal

fmap · on Dec 24, 2017

Do you know why that is the case? Nominal logic is usually presented as a sheaf model (i.e., as the internal language of the Schanuel topos), which models a constructive dependent type theory. Is there a problem when constructing a universe?

mafribe · on Dec 24, 2017

I'm probably the wrong person to discuss this question with. I don't know more than I've written above. All I know about this is from conversations with Christian Urban, who implemented Nominal Isabelle. It was a while back, maybe the problem of implementing nominal techniques in constructive type theory in an effective way (i.e. doesn't put additional burden on automation) has been solved by now.

The problem as I understand it is this. Let's say you have a theory T in the formal language given by signature S.

- In HOL you extend T to T' and S to S', but the deltas between T and T', and S and S' are small, and largely generic. This is possible because almost everything handling nominal can be done using double negation.

- In CoC you extend T to T' and S to S', but the deltas between T and T', and S and S' are huge, and largely non generic.

pacala · on Dec 24, 2017

Is there a motivating example that compares nominal techniques in Isabelle with standard techniques in Coq or Lean over the same simple theorems?

mafribe · on Dec 24, 2017

Not that I'm aware of. It would be nice to have one. The number of people who understand nominal, HOL, CoC and implementations of provers is vanishingly small.

sddfd · on Dec 24, 2017

Coq itself is not a monolithic system, but built around a small TCB (the kernel) itself. Proof automation and tactics always produce proof objects (terms of the type theory) that are checked by the kernel.

This system design allows to built proof automation outside the kernel, i.e. it is safe to use proof automation that hasn't been proven correct, because any proof found by the automation will be checked by the kernel before it is accepted.

Another feature of Coq, which unfortunately is underused in practice is the inclusion of a notion of reduction in the type theory.

Terms of the type theory are programs, and equality in the type theory is defined up to execution of these programs.

This means that in principle, one can construct a function SMT(s) and a proof that SMT(s) returns true if the smt formula s is satisfiable, and false otherwise.

The cool thing is that a proof that a certain formula s is unsatisfiable can now be represented as an equality proof SMT(s) = false. This proof is very small, because it relies on the function SMT which is essentially a verified SMT solver.

You might wonder how quickly SMT can be executed. The answer is that you would have to write SMT with purely functional data structures, but that Coq has an integrated JIT compiler that will speed up execution.

So Coq is a pretty amazing system, but it is, 30 years after its creation, still in its infancy.

valuearb · on Dec 24, 2017

We have 90,000 lines of code in a commercial application with no integration or unit tests, or any test automation at all, and that’s used by approximately 15,000 users a day.

Obviously we have bugs. We measure both our crash rate and usage/behavior with public analytics packages. About one out of every thousand uses of the application ends in a crash. More often, a user is unable to complete a task in the app because of a code failure, or poor/misleading UI.

I don’t know how hard it would be to formally verify our application, but I suspect at least 10x the cost of building it. We don’t talk about formal verification, but we do talk about the need for unit tests. My answer to both is: stop, think about what you are saying.

Our users are rarely inconvenienced by crashes, our crash rate is lower than 90% of our peer group already. Spending even another 1x the time and effort spent building this app trying to formally prove it and eliminate those few last crashes would be a massive expenditure for a tiny gain in user satisfaction, and only in one area. It would literally set us back years in functionality.

Writing/debugging/maintaining 3,000 unit tests doesn’t offer much better benefits because our application is almost entirely front end GUI. Writing automated UI tests today will have to be entirely rewritten next year when we update the entire UI, and then likely within another year as we do it again.

In our case the most productive way our team can improve the application for its users is to improve functionality and usability. We will create bugs along the way but find most in manual testing, and our usage will increase as customer benefits increase.

My point isn’t that we are doing everything right and that formal verification is wrong, or that unit tests are wrong, etc. My point is every application is unique. They are all developed with limited resources and those resources should be focused on producing the best possible outcome.

If I was leading a team at NASA developing embedded software for a space probe, pair programming and unit testing would be required, along with integration testing, automation and we should consider formal verification. We would only have one shot at getting it right or a billion dollar probe will fail.

But my current team doesn’t face such an immense cost from its bugs. We can push an update to fix critical bugs to every customer whenever we deem necessary. Our biggest challenge is getting customers to use our application, and to use it more often. Customer impediments are far more often poor design decisions than crashes or mis-implementations. So that’s what we need to focus on.

sanderjd · on Dec 24, 2017

> Our biggest challenge is getting customers to use our application, and to use it more often.

That may be your biggest challenge now, but at some point, you may find that in order to overcome that challenge, you need to add major new functionality that requires changing broad swaths of your application. When doing that, you may find that your crash rate becomes unacceptably high as your changes begin violating assumptions you didn't know you were making. That's what unit and integration tests are for, they help expose those assumptions before a user does. This is a common misunderstanding, tests are not about catching bugs in the present, they are about decreasing the risk of adding new functionality in the future. I would go so far as to say that if you're working on a project with a defined end state, where there is zero chance it will ever be expanded past that end state, then automated tests are pointless; spend a lot of time on QA of the end product, shrink wrap it when you're happy with it, and don't look back. But I would also go so far as to say that nobody makes software like that. I certainly haven't ever worked on a project that is unlikely to have major, risky, expansions in functionality during its lifetime.

valuearb · on Dec 24, 2017

I agree with your general point, but currently don’t think the math adds up on unit testing for us.

Its not that it wouldn’t be better if we had unit tests. But their benefits are mostly in the future, meaning they need to be discounted for the value of time. And even without discounting, they won’t be as valuable to customers and our product as actually creating new and fixing existing features.

If our product fails in the next year, any efforts yielding most of their benefits after that are wasted. Any efforts that yield more customers and usage now, gives us more resources to do more things in the future, including unit tests.

That way if we get to that major, risky expansion in the future we will have more resources to write unit tests then, when they are most useful.

And lastly, i’ve reviewed the bugs that business prioritizes for us. Virtually none, including crashes and feature fails, would be caught by traditional unit tests. Lots of them were force unwrapping code written by original contractors that would pass unit test, but crash in multithreaded environment of garbage collection frees the right object.

To do unit tests we would also have to write a ton of mocks, making our production code less maintainable and understandable, and potentially introduce new bugs.

mcguire · on Dec 24, 2017

Coq is significantly more powerful and more flexible than SMT-based approaches, like those of SPARK, Frama-C, Dafny, and the verified Java things.

On the other hand, those are much easier to use, as someone with a background in (a) programming and (b) Hoare/Dijkstra/etc. axiomatic specifications.

zitterbewegung · on Dec 24, 2017

I think that rosette is a good tool which abstracts a bunch of the difficulty of SMT solvers and also allows you to synthesize programs https://emina.github.io/rosette/

WhitneyLand · on Dec 24, 2017

The proof is in the pudding.

How many real world production systems have benefited from formal methods, significantly more than if the same resources used to do so had been applied in other ways?

Let’s assume you can name a few, how general in the class of systems this can apply to? Or this just an obscure corner of computer science that applies to NASA and various low level system components?

I don’t count out formal methods, but can you really say none of this reminds you of the AI winter?

CalChris · on Dec 24, 2017

seL4 comes to mind.

https://sel4.systems/

WhitneyLand · on Dec 25, 2017

Is there evidence that it’s more reliable compared to equally complex systems? Or that it’s reliability would have been impractical or more expensive using traditional methods?

cmrx64 · on Dec 25, 2017

Yes. https://microkerneldude.wordpress.com/2016/06/16/verified-so...

WhitneyLand · on Dec 26, 2017

I didn’t hold out much hope of a surgical response (because it’s not the most common thing being done in projects) but that looks like a great link exactly on point.

It may sound cynical but as I mentioned anecdotally is does really feel like the AI winter, where simultaneously people felt there was incredible promise while hearing a steady stream of discouraging progress reports.

I look forward to reading their paper, maybe there are a lot more bright spots than can be easily seen from the outside.

CalChris · on Dec 27, 2017

I think that formal verification is going to be size limited. seL4 is a microkernel which means small, about 9000 LOC. I don't think xen at 150,000 is within the abilities of verification. So it's promising for small stuff.

CalChris · on Dec 24, 2017

> it's the catchy name

Coq's been around since 1989. Maybe those 28 years have something to do with it.

wizeman · on Dec 24, 2017

> Coq's been around since 1989. Maybe those 28 years have something to do with it.

Isabelle has been around since 1986, so that alone doesn't explain it.