I have been complaining about Rust's "unsafe" for years. There's too much "unsafe" code because there are things that you either can't express, or are very hard to express, in safe Rust. I've pointed out partially initialized arrays and backlinks as key trouble spots. Both are potentially fixable. I've gotten back complicated excuses for not dealing with these design problems. There was a serious denial problem in the Rust community. Now it's come back to bite them.
Language design that forces users to lie to the language leads to this. The classic is C's "a pointer is also an array" mindset. When you write
int read(int fd, char* buf, size_t buflen)
that "char *" is a lie. You're not sending a pointer to a single character. You're sending a reference to an array of unknown size. That design mistake in C has probably been the cause of more serious bugs than any other mistake in the history of programming. If you have to lie to the language, the language is broken.
there are things that you either
can't express, or are very hard
to express, in safe Rust.
This is true (and in some sense inevitable by Rice's theorem) but it is unclear what can be done about this. Bear in
mind that Rust's design constraints are
It is easy to get rid of unsafe blocks if you throw complexity at Rust, e.g. use dependent types and
/ or theorem provers, but all the obvious ways of doing this
immediately violate the constraints above. Doing it within Rust's
constraints is an open research problem.
potentially fixable.
What would you use as an objective, measurable criterion that we can
use to decided whether a purposed fix is in fact a fix? If you have a
concrete design proposal that works (in particular w/o destroying
Rust's fast (bidirectional) type-inference/type-checking and the
simplicity of Rust's types), and evidence that it's widely applicable
(i.e. doesn't just allow to type some trivial edge cases of back-links
and partially initialised arrays) that would easily publishable in top
PL conferences (e.g. POPL) and a working prototype would help towards
getting this into Rust.
I would be happy to help with implementing & writing up such work. I
have quite a bit of experience with sub-structural types a la Rust. I
have spend some time thinking about this problem space, but didn't
manage to crack it. I have an incoming PhD student whom I could ask to
help with this.
Oh, not Rice's theorem again. That only applies to infinite systems. There are useful decidable properties for a reasonably large set of programs. Yes, they're "uninteresting" mathematically, but we're talking about memory safety here. If your program is anywhere near undecidable in that area, you're doing it wrong. Any time you're close to the edge on that, you can get decidability with a run-time check. The Microsoft Static Verifier people have been down that road with some success. I went there 40 years ago, with the Pascal-F Verifier.
The question I'm asking is, what do you really need to do that you can't do in safe code. On the memory safety front, for pure Rust (no calls to C), probably not all that much. Maybe only this:
- Syntax for talking about partially initialized arrays. For a collection, you have a variable that indicates how much of the array has been initialized, and you need a statement in the language that ties that variable to a partially initialized array. Then you can do simple inductive proofs to show that as you initialize another element and add 1 to the counter, the entire array is initialized. Non-consecutive initialization would require more complex proofs. Is there a real use case for that? One with major performance implications, where initializing the data to some null value would be too slow?
Un-initialized blocks of types such as raw bytes aren't a problem, because those types are "fully mapped" - all bit combinations are valid. Only types that are not fully mapped an initialization guarantee.
- Backlinks. Backlinks in doubly-linked lists and trees obey some simple invariants. You need syntax to say "this is a backlink". That implies an invariant that if A points to B, B must point back to A. Everything which updates A or B must preserve that invariant. The pointer from B back to A then no longer needs to have ownership of A. (It's a problem that Rust doesn't have "objects", because such invariants are usually required to be true when control is outside the object, but can be broken momentarily while control is inside it. The inside/outside distinction is important for this, and Rust is a bit soft on that. Spec# took inside/outside seriously, but viewed it dynamically, rather than being tied to static scope. Haven't looked at that in years.
What else do you really need? Remember that some knotty ownership problems can be resolved by allocating the objects in an array which owns them, and using array indices for relationships between the objects. It's not like you have to use pointers.
Concurrency is a related issue, but I've said enough for one night.
I'm still unclear about this, should this be done by the typing system
or by an external static verifier:
simple inductive proofs to show
that as you initialize
another element and add 1 to
the counter
If the latter, you can already do this with Rust as it is, it's just a
question of building such a static verifier. If the former, I really
doubt that this can fit in Rust's typing system without major
surgery. We know well how to add arithmetic to typing systems, but the
natural approach to doing so (dependent types) will probably make
type-checking / -inference undecidable.
I’m replying to your message using software written in C, so do you think that it might be a little bit unfair to call it “broken”?
I suppose I’d agree if you’re making a general argument that most of the software we use is broken to some degree, but it has an entirely different meaning to directly tell someone “Your library is broken”.
Are you? All web browsers I know of are written in C++ and always have been. In modern C++ you don't use char* to mean "pointer to byte array of unknown size", you use std::vector. Moreover Chrome actually uses garbage collected C++ (they call it Oilpan).
No, that's what the caller says is the length of the array and as we already established, they're lying. ;)
In all seriousness there is quite a difference in structures that allow themselves to have guarantees built into the language/tooling (like including the size of the array in the languages basic "array" building block) vs. the more manual book-keeping the C code above requires.
Language design that forces users to lie to the language leads to this. The classic is C's "a pointer is also an array" mindset. When you write
that "char *" is a lie. You're not sending a pointer to a single character. You're sending a reference to an array of unknown size. That design mistake in C has probably been the cause of more serious bugs than any other mistake in the history of programming. If you have to lie to the language, the language is broken.Rust had so much promise, too.