>The point is that code NOT bearing the tag is guaranteed to be memory safe and ...

Jweb_Guru · on April 8, 2018

> No, it isn't, because it may call code that uses unsafe constructs. (That's why you don't have to put an unsafe block around your code every time you push a value into a vector.)

For the last time: if you have proven the module using the unsafe code correct, you do not have to treat code calling it as unsafe, and safe code calling it is guaranteed to be free of data races and memory safe.

> No, only the code that uses unsafe constructs needs to be treated as unsafe. You could easily make a tool that automatically inserted 'unsafe' tags around C++ code that used unsafe constructs such as pointer arithmetic. The fact that this tool is built into the Rust compiler is just a convenience.

That wouldn't solve anything at all. C++'s type system cannot and does not enforce safe usage of APIs like string_view; it doesn't matter whether the safe code is using "unsafe constructs." If you choose to mark all C++ or C libraries that rely on invariants that can't be proven safe by the type system as unsafe, you will quickly find that essentially your entire program is covered in an unsafe block. The difference in Rust is not that it has unsafe blocks, but that its type system can safely encapsulate interesting APIs whose proof of safety is nontrivial.

foldr · on April 8, 2018

> if you have proven the module using the unsafe code correct, you do not have to treat code calling it as unsafe

Yes, I know, but lomnakkus said "code NOT bearing the [unsafe] tag is guaranteed to be memory safe" without adding this qualification. Without the qualification, the statement is incorrect.

>The difference in Rust is not that it has unsafe blocks

Indeed. That’s what I said above.

For some reason, Rust advocates don't like it when people point out that unsafe blocks, in and of themselves, do not add any kind of safety guarantees to the language or limit the effects of unsafe code.

Of course Rust's type system is able to express important invariants that C++'s type system can't, and this can be used to prevent iterator invalidation etc. etc. The point is that if you do have an unsafe block in your own code, you are going to have to go to the trouble of constructing a formal proof -- using tools that haven't been fully developed yet -- in order to be sure that any of the rest of your code is safe. And at present, the proof requires a translation step, which cannot be formally specified, from Rust code to LambdaRust code.

Rust is undoubtably much safer than C++ in practice. I just think the extent to which it provides hard and fast safety guarantees tends to be exaggerated somewhat in its marketing.

Jweb_Guru · on April 8, 2018

Sure. There is a lot of work left to do to bridge the gap between the formal model and Rust code. But we can be reasonably confident at this point that the libraries we've proven correct actually are correct (modulo translation errors), and we specifically set out to prove things about libraries that were safe for interesting or nontrivial reasons. That's a huge step forward. It means that, with some time and effort, producing codebases with formal proofs of memory safety can be done at quite low verification cost. The executable might need to be generated with something other than LLVM, of course, but using tools like CompCert is possible in these cases; the primary difficulty with it is normally that it only correctly compiles correct C, which is difficult to find in the wild to say the least.

Not only that, but thanks to Rust's type system being able to express separation logic invariants, we may be able to do many functional proofs of correctness more easily; right now, refinement types are mostly limited to systems like Liquid Haskell because it is hard to generate invariants for impure code, but by integrating with Rust's type system such things become possible. There is at least one research group currently investigating this possibility, and it looks quite promising. Such work entirely depends on the fact that safe code that passes the borrow checker must be memory safe and has enough annotations that resource aliasing can be determined statically, freeing programmers from the burden of manually specifying primitives like separating conjunction and magic wand. Invariants would be stated at the function level and verified modularly (on a function by function basis), rather than current state of the art techniques used by model checkers, which usually require whole program verification. This has the chance to put scalable formal verification of functional correctness for many common errors (like integer overflow and out of bounds array accesses) in the hands of programmers who don't have training in formal verification; I find that very exciting!

I'd also add: requiring the keyword unsafe actually is fairly important in practice. Without it, there wouldn't be a way to assert to the compiler when a block of code that used unsafe primitives was safe (I suppose you could use something like "trusted" to mark a block safe instead of "unsafe" to mark it unsafe, but I suspect the ergonomics of that would be fairly untenable). So it's not entirely right to say that unsafe is pointless; it's a necessary component. It's just a trivial one.

foldr · on April 8, 2018

I don't deny that Rust's lifetimes make it possible to provide static guarantees that C++ can't.

The issue for me is a common misperception about unsafe blocks that the Rust community seems reluctant to squash. All I did in this (huge) thread was point out that unsafe code in an unsafe block can screw up 'safe' code outside the unsafe block. That is simply a fact about how Rust works.

The response in this case has been to go off on all kinds of tangents about formal verification that have little relevance to everyday Rust coding. Joe Coder's not going to formally verify his implementation of a doubly-linked list that uses a couple of unsafe pointer operations. It's probably best if he's aware that strictly speaking, none of the code that uses that implementation is guaranteed to be safe.

This is fundamentally the same kind of encapsulation that 'modern' C++ uses (unsafe code with hopefully safe wrappers), with -- as you point out -- the significant difference that C++ has a type system which would requires runtime checks to enforce certain guarantees that can be verified at compile time in Rust. (I believe it is possible to write C++ containers that catch invalid iterators at runtime: https://ac.els-cdn.com/S1571066111000764/1-s2.0-S15710661110...)

> Without [unsafe blocks], there wouldn't be a way to assert to the compiler when a block of code that used unsafe primitives was safe

Well, sure, but that's something that could be done by a linting tool without any significant loss of functionality, so far as I can see.

Jweb_Guru · on April 8, 2018

Frankly: Joe Coder isn't implementing a linked list in Rust. I encourage you to take a look at most of the repositories we can find on Github; while many use unsafe (mostly to call out to third party C libraries), very few are doing so to create nontrivial data structures. Obviously, each usage of unsafe has the potential to destroy all memory safety guarantees for the program, and I personally think people reach for it much too frequently. Nonetheless, if something goes wrong, people know where to look; as it turns out, it is actually possible that the unsafe code is correct even when it lacks a formal proof (only two memory safety errors were found during the verification effort, that I'm aware of). One hopes that the likelihood of this code being correct goes up significantly when there is less of it and it is marked for extra scrutiny (I certainly know that my first reaction when I look at a new Rust repository is to search for `unsafe` and see what bugs I can find).

When you talk about formal guarantees, you are inherently talking about formal verification. I think it's pretty disingenuous to claim that such talk is irrelevant here; at least to me, it's highly relevant, and I really do care about being able to formally verify code on top of Rust. However, even if that doesn't happen, Rust is still in pretty good company re: safety; pretty much all industry languages expose a C FFI, which means that theoretically all their safety guarantees are completely out the window if even one C function is called (there are people who are working on formally verified safe linking, but the work is still very early). The C FFI is usually wrapped in something that is (hopefully) a safe wrapper, though not always. This is exacerbated by the fact that most languages' type systems can't actually enforce the sorts of invariants that C code needs (for example, a lot of older C libraries use static data in a non-thread-safe way; most languages can't design APIs that require calls to a library to stick to a single thread). Since most of the code in those languages is written in the safe subset, it's nonetheless pretty easy to find the (broad) culprit for a segfault when something goes wrong, and over time most safe languages develop their own safer alternatives to the unsafe C libraries (where possible). The hope (my hope, anyway) is that over time, more and more Rust can transition to the safe subset (which may require extending the type system), and much of the Rust that can't is formally verified. I don't realistically expect that all Rust will be so verified, but I do hope that (1) if you want to, you'll be able to formally verify realistic, large programs, and (2) even those that don't have few enough bugs that it becomes very difficult for attackers to exploit them.

As far as the link you posted goes: being able to catch iterator invalidation at runtime in the presence of multiple threads incurs a pretty nontrivial performance impact (Java, for example, doesn't consistently detect it in the multithreaded case). I wrote this before I even read the article. In the article, they explicitly did not consider multithreaded programs (which is true of most such articles). Despite that, while some operations were within 10% of the unsafe version, or even faster, others were anywhere from 2 to 7 times slower. Despite the article's claims, I don't see any reason to believe that the slow variants are less common than the fast ones.

And unfortunately, iterator invalidation is not some isolated issue: it's a common example of what Rust can guarantee, but it stems from the much more basic issue that updates to data structures with restrictions on legal byte patterns must be restricted in the presence of aliasing. Updating a C++ variant at all, for instance, is not safe without strong aliasing guarantees, even in the single threaded case. For a substantial amount of C++ code, you'd have to either go the Java route (make every data structure valid after any word-size partial update, so all operations can be weakly atomic, at the cost of losing interesting mutable in-place data structures and not being able to perform some common compiler optimizations), or wrap updates to many kinds of values in mutexes (which would completely destroy performance in most cases). "Just do it at runtime" has severe performance tradeoffs in the domain that C++ is targeting; requiring shared_ptr to be atomic in the presence of threads is already bad enough that many larger projects write their own custom reference counted types (whose correct use can't be verified at compile time). If people were willing to eat it everywhere, I think they would have switched to Objective C or Swift by now, which use clever tricks to reduce the impact of pervasive reference counting. To see some examples of how this would have to work, see how Swift handles things like string slicing (and compare it to the Rust or C++ versions).

The reason I can say all this pretty confidently is because Rust has dynamic safety mechanisms as well. In almost all cases, you can choose between four options: a relatively cheap (but expensive space-wise) single-threaded version, an almost free (but restricted in functionality) single-threaded version, an expensive time-and-space-wise (and relatively general, but still restricted) multi-threaded version, and an extremely restricted multi-threaded version that can't handle interesting updates. People usually opt for a single-threaded version where they use it at all, because they cannot eat the performance and functionality impact of the multithreaded one. If you try to enforce safety with dynamic checks in C++, you will lose to Rust on performance; in fact, you will probably lose to Swift (which is designed from the getgo with these tradeoffs in mind).

In short: even single threaded dynamic validation can have a noticeable performance impact. The bare minimum you need to be competitive with Rust and remain safe is distinct single-threaded and multithreaded dynamic mechanisms, together with the ability to reject code that misuses them at compile time; these mechanisms can only handle aliasing tracking and do not replace the functionality of lifetimes, which can only be relaxed using unique ownership, reference counting, or leaking data. Static thread safety verification would thus need to be restricted to data without references. Given all this, I believe that the only way to make everything work dynamically would be to disallow first-class references entirely in C++. That would break almost all C++ code in the wild, and would make its runtime model comparable to that of Swift. Common tricks to get around reference counting in Rust (like arenas) would necessarily have a large performance impact in this model, and would probably not be used much. Modern or not, C++ doesn't encourage writing code in this way.

> Well, sure, but that's something that could be done by a linting tool without any significant loss of functionality, so far as I can see.

It can't be done by a linting tool, because a linting tool has no idea whether a particular use of unsafe code is well-encapsulated or not. The only way we know how to prove this right now is using formal verification, so we assert safety of such code with manual annotations. I really should have mentioned the alternative as "declare everything that isn't allowed to use unsafe code as `untrusted`" because that's the actual alternative; marking well-encapsulated unsafe code as trusted is actually how things work today. I hope you can see why having the default be "trusted" is not a satisfactory alternative for a language that wants the default to make writing safe code easier than writing unsafe code; you should have to opt into being able to shoot yourself in the foot, and it should be easy to find code that claims to be upholding invariants.

foldr · on April 8, 2018

>Frankly: Joe Coder isn't implementing a linked list in Rust

Not so sure about that. I wrote a parse tree transformer in Rust and wanted to write a mutable iterator for my parse tree. That requires unsafe code. So you don't have to be doing anything particularly esoteric to require a small amount of unsafe code. Writing a small amount of unsafe code is not the end of the world, of course. I am not saying that this is a major flaw in the language.

Re catching iterator invalidation at runtime, I should have said explicitly that I was not claiming that this is as good as catching it at compile time.

>It can't be done by a linting tool, because...

I meant that the functionality provided by the 'unsafe' keyword itself could be enforced by a linting tool. (E.g. any block that uses one of the unsafe operations must contain a comment saying 'unsafe code here'.) If this isn't the case, I must have some fundamental misunderstanding about what the unsafe keyword does. The Rust docs suggest that it simply enables a small number of unsafe operations such as pointer arithmetic.

Jweb_Guru · on April 9, 2018

Well, it's pretty trivial, but there are a few subtleties. Keep in mind that unsafe has another meaning when attached to a function type or trait, which interacts with the type system. That can have surprising but necessary consequences (such as not being able to pass an unsafe function as an argument where a closure with the same signature is expected). But yeah, it's not really a first class type and it has deliberately simple semantics, so if you just want to replicate the exact functionality of unsafe in Rust you could probably do it as a lint (though for stylistic and ideological reasons I don't think Rust would ever consider doing so).