I entirely agree that safe memory management is one of the most important topics...

Yoric · on Oct 26, 2021

> I just wonder if our time would be better spent developing tools to formally verify memory safety of existing languages, instead of trying to "rewrite everything in Rust" to get safety.

In theory, that makes sense.

Now, as a sibling post points out, such tools have been around for dozens of years. Microsoft started using these tools for drivers around the time of Windows XP, if I recall correctly. I've used such tools. Heck, I've developed such tools. In my experience, they require orders of magnitude more experience (both of the tools, of the underlying theory and of the codebase) than Rust (or Ada, or other safe languages) to achieve comparable levels of safety.

kaba0 · on Oct 26, 2021

But metadata is needed for proper formal reasoning, and C has barely any additional information content over some very vague “typing” hints, everything else being only conventions enforced more or less by the linux community.

So Rust’s lifecycle annotations are what makes proper proofs possible in the first place.

(As for proofs though, they are no silver bullets either — a program of linux’s complexity is unlikely to be ever completely verified, but rust may indeed give valuable guarantees for most of the program)

pjmlp · on Oct 26, 2021

On Windows we have SAL for that metadata.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/sal-a...

Naturally it doesn't help for portable code, nor if the people don't make use of it, or the static analysers on the code.

jasode · on Oct 26, 2021

>I just wonder if our time would be better spent developing tools to formally verify memory safety of existing languages,

The industry can't implement your suggestion because while computer science can formally verify a restricted subset of some C source code, we can't write a generalized tool to scan any C code to prove memory safety.

Consider memory operations of existing language such as C Language with these statements:

  char *dst;  // value is set anywhere in the program
  char *src;  // value is set anywhere in the program
  size_t  sz; // value is set anywhere in the program
  // ... bunch of other code that may alter dst/src/sz ...
  memcpy(dst, src, sz);  // is that memory safe?

Nobody has yet written analysis tools to prove & verify that the memcpy() is "safe" because predicting with perfect accuracy how variables dst, src, sz will get set & change during runtime are variations of the unsolvable Halting Problem: https://en.wikipedia.org/wiki/Halting_problem

That's why there's still no computer science research paper showing a tool that can scan general C source code to prove that all calls to memcpy(), strcpy(), and any writes to RAM via aliases & pointer dereferencing are "memory safe".

We're hoping for "Sufficiently Smart Compiler"[1] type of tool that can verify C code is safe but the source itself doesn't have enough information embedded in it to prove that.

So the alternatives so far have been:

- static approach: extra syntax for manual annotation (C++ unique_ptr<>, or Rust borrow &ampersand, or manual mutex lock/unlock)

- dynamic approach: runtime tests with extra memory checks such as Valgrind + UBSAN + ASAN + debugger memory fences, etc. This can catch some but not all memory errors. 100% provable is not possible for non-trivial programs because of the combinatorial explosion of runtime execution paths and memory values.

- GC garbage collection in a different "managed" language. The tradeoff is higher RAM footprint, extra cpu and GC pauses for tracing scanners.

[1] http://wiki.c2.com/?SufficientlySmartCompiler

kaba0 · on Oct 26, 2021

> GC garbage collection in a different "managed" language. The tradeoff is higher RAM footprint, extra cpu and GC pauses for tracing scanners

It would hardly mean a problem for most programs, including OS kernels, especially that kernels are themselves GCs for resources like processes, file handlers, etc.

Also, GC pauses are overblown, most of the tasks can be done in parallel.

Yoric · on Oct 30, 2021

> Also, GC pauses are overblown, most of the tasks can be done in parallel.

I've spent a few years writing code that needed to run at 60fps and, well, that's not my experience :)

bawolff · on Oct 26, 2021

People have been trying the formally verify things route for way longer than rust has been a thing.

johnisgood · on Oct 26, 2021

People really ought to read about Ada/SPARK before jumping on the Rust bandwagon.

pjmlp · on Oct 26, 2021

It was obvious from the yearly days that C was a sharp knife, that is why lint was created in 1979.

How many people bother to actually use such tooling on their C projects?

Not many, most surveys place that number around 11%.

Just as an exercise, Github offers such tooling for C and C++ projects, try to find how many have CodeQL workflows enabled for pull requests.

bregma · on Oct 26, 2021

I work with safety-certified systems written in C. The tooling around such things is extensive and mature: static analysis, extensive test runs with runtime analysis instrumentation (test coverage at the MC/DC level, valgrind for memory and concurrency analysis) it's all there and used.

It's true that a lot of projects written in C do the equivalent of using Rust's "unsafe". Then again, a lot of Rust projects use Rust's "unsafe".

pjmlp · on Oct 26, 2021

Welcome to the 11% mentioned on my comment.

JOVIAL and ESPOOL were the first system languages with a notion of unsafe code blocks, both about 10 years older than C.

It is a big difference to have unsafe code jump to our eye and being easily searchable, or having each line of code as a possible source of integer overflows, memory corruption, or UB optimization exploit of the day.