What should be added, I think, to code reviewing is that it can get really complex, for example if we add formal verification in the mix to catch very subtle bugs.
So in the end I think there will still be some disappointment, as one would expect it should be fully automated and only about reading the code, like this article suggests. In reality, I think it is harder than writing code.
A new extraction system from Rocq to functional-style, memory-safe, thread-safe, readable, valid, performant, and modern C++.
Interestingly, this can be integrated into production system to quickly formally verify critical components while being fully compatible with the existing Bloomberg's C++ codebase.
This is ... okay, if you like formal systems, but I wouldn't call it performant. Depending on what you are doing, this might be performant. It might be performant compared to other formally verified alternatives. It's certainly a lot nicer than trying to verify something already written in C++, which is just messy.
From theories/Mapping/NatIntStd.v:
- Since [int] is bounded while [nat] is (theoretically) infinite,
you have to make sure by yourself that your program will not
manipulate numbers greater than [max_int]. Otherwise you should
consider the translation of [nat] into [big_int].
One of the things formal verification people complain about is that ARM doesn't have a standard memory model, or CPU cache coherence is hard to model. I don't think that's what this project is about. This project is having basically provable code. They also say this in their wiki:
> Crane deliberately does not start from a fully verified compiler pipeline in the style of CompCert.
What this means is that you can formalize things, and you can have assurances, but then sometimes things may still break in weird ways if you do weird things? Well, that happens no matter what you do. This is a noble effort bridging two worlds. It's cool. It's refreshing to see a "simpler" approach. Get some of the benefits of formal verification without all the hassle.
The output you posted is from an example that we missed importing. It's also one of the tests that do not yet pass. But then again, in the readme, we are upfront with these issues:
> Crane is under active development. While many features are functional, parts of the extraction pipeline are still experimental, and you may encounter incomplete features or unexpected behavior. Please report issues on the GitHub tracker.
I should also note, mapping Rocq types to ideal C++ types is only one part of it. There are still efficiency concerns with recursive functions, smart pointers, etc. This is an active research project, and we have plans to tackle these problems: for recursion: try CPS + defunctionalization + convert to loops, for smart pointers: trying what Lean does (https://dl.acm.org/doi/10.1145/3412932.3412935), etc.
Thanks, yeah, I did see some of the pure C++ types getting converted. Though it makes a lot of sense why you had to go with shared_ptr for generic inductive types.
Have you considered combinatorial testing? Test code generation for each sample program, for each set of mappings, and ensure they all have the same behavior. If you look at the relative size or performance, it could allow you to automatically discover this issue. Also, allocation counting.
Hey also sucks you are not in SF. I'm looking for people into formalization in the area, but I haven't found any yet
Our plan was to do random Rocq program generation and differential testing of Crane extracted code versus other extraction methods and even CertiCoq. But fixing a program and trying different mappings would certainly be a useful tool for any dev who would like to use our tool, and we should put it on our roadmap.
In Rocq/Coq, you have "extraction", which is the standard way to compile programs. This is how the C compiler CompCert is executed, for example. So, all of these languages are in the same category in this respect.
There are RISC-V based zero-knowledge virtual machines, to which a GCC version can be compiled. So this should be possible, although probably very slow, maybe a thousand times slower than a normal GCC execution.
That was 500 KHz, and the latest versions are at 1MHz, so still 1000 times slower than a 1Ghz machine, but it’s easy to parallelize the workload, so it’s really mostly a cost of computation issue.
No, we have not found bugs! We have only scratched the surface and bugs are probably well hidden as the standard library is very well tested. We do not expect to find bugs there, especially as we also assume the unsafe code is safe.
Our main hope with the translation to Coq of the standard library is to get a formalization of it, so that we can precisely verify Rust programs calling core and alloc.
There are some specifications that are always there, like the absence of reachable panics or the backward compatibility between releases on stable entry points.
Otherwise you can outsource this work to specialized companies such as Formal Land or others!
I do not know how RustHornBelt works. We are focusing on safe code, although we still generate a translation for unsafe blocks as a "best effort".
Compared to Aeneas the goal is very similar as we want to verify Rust programs using interactive theorem provers. However, with coq-of-rust we write the purely functional version of the code (the one on which we make proofs) by hand, or with the help of some GitHub Copilot as this is rather repetitive, and prove it equivalent with the automatic translation. In Aeneas the aim is to directly generate a functional version.
We handle all the pointers as if they were mutable pointers (the `*` type). We do not use any information from Rust's borrow checker, which simplifies the translation, but we pay that at proof time.
To reason about pointers in the proofs, we let the user provide a custom allocator that can be designed depending on how the memory will be used. For example, if the program uses three global mutable variables, the memory can be a record with three entries. These entries are initially `None` to represent when they are not yet allocated. We do not yet know how this technique can scale, but at least we can avoid separation logic reasoning for now. We hope that most of the programs we will verify have a rather "simple" memory discipline, especially on the application side.
I should add that for pointers to immutable data, we can translate them to immutable values on the Coq side. Thus for Rust code written in a purely functional way (using only immutable data structures) the translation is almost one-to-one with Coq, that is to say the same code up to syntactical differences and the verbosity of our monad that behaves like the identity monad in that case.
Thus we can provide a path for people who are ready to sacrifice performance for proofs. I guess that immutable Rust is simpler to verify with the other systems too.
> I guess that immutable Rust is simpler to verify with the other systems too.
I don't think that's the case (it's not harder either). The type system of Rust makes handling mutability fairly trivial in verification, in fact it could be translated to immutable code in a continuation monad like the Tardis monad from Haskell I think.
The code is translated automatically with coq-of-rust! When issues are found in the translation they can be fixed once in the coq-of-rust tool, and all the translations are updated.
Ok, what I think I do not understand is what they mean by "tedious and error prone"? Is it tedious to write the automated translation (aka the coq-of-rust tool in this case) or does the translation of a concrete piece of code (e.g. the core crate) still involve manual work?
The "tedious and error prone" code was what we were doing before, when the translation of the standard library was not yet working automatically with coq-of-rust. Now, this is automatic. Maybe the explanations on the blog post were not clear.
So in the end I think there will still be some disappointment, as one would expect it should be fully automated and only about reading the code, like this article suggests. In reality, I think it is harder than writing code.
reply