Hacker Newsnew | past | comments | ask | show | jobs | submit | safercplusplus's commentslogin

Interestingly, I recently auto-translated wget from C to a memory-safe subset of C++ [1], which involves the intermediate step of auto-converting from C to the subset of C that will also compile under clang++. You end up with a bunch of clang++ warnings about various things being C11 extensions and not ISO C++ compliant, but it does compile.

[1] https://duneroadrunner.github.io/scpp_articles/PoC_autotrans...


And I might suggest that there's the possibility that the C++ code could end up being more cleanly ported to a memory-safe subset of C++. plug: https://github.com/duneroadrunner/scpptool/blob/master/appro...


Plug: In theory you could auto-convert to a memory-safe subset of C++ as a build step. Auto-converted code would have some run-time overhead, but you can mark any performance-sensitive parts of the code to be exempt from conversion. And you get lifetime and type safety too. For full coverage, performance-sensitive parts of the code can be manually converted to the safe subset to minimize overhead. (Interfaces in extern C blocks remain unconverted by default to maintain ABI compatibility.)

[1]: https://duneroadrunner.github.io/scpp_articles/PoC_autotrans...


As long as we're plugging our projects, I'll mention the scpptool-enforced memory-safe subset of C++. Fil-C would be generally more practical, more compatible and more expedient, but the scpptool-enforced subset of C++ is more directly comparable to Rust.

scpptool demonstrates enforcement (in C++) of a subset of Rust's static restrictions required to achieve complete memory and data race safety [1]. Probably most notably, the restriction against the aliasing of mutable references is not imposed "universally" the way it is in (Safe) Rust, but instead is only imposed in cases where such aliasing might endanger memory safety.

This is a surprising small set of cases that essentially consists of accesses to methods that can arbitrarily destroy objects owned by dynamic owning pointers or containers (like vectors) while references to the owned contents exist. Because the set is so small, the restriction does not conflict with the vast majority of (lines of) existing C++ code, making migration to the enforced safe subset much easier.

The scpptool-enforced subset also has better support for cyclic and non-hierarchical pointer/references that (unlike Safe Rust) doesn't impose any requirements on how the referenced objects are allocated. This means that, in contrast to Rust, there is a "reasonable" (if not performance optimal) one-to-one mapping from "reasonable" code in the "unsafe subset" of C++ (i.e. traditional C++ code), to the enforced safe subset.

So, relevant to the subject of the post, this permits the scpptool to have a (not yet complete) feature that automatically converts traditional C/C++ code to the safe subset of C++ [2]. (One that is deterministic and doesn't try to just punt the problem to LLMs.)

The problem isn't dedicating public resources to trying to getting LLMs to convert C to Safe Rust after investments in the more traditional approach failed to deliver. The problem is the lack of simultaneous investment in at least the consideration and evaluation of (under-resourced) alternative approaches that have already demonstrated results that the (comparatively well-funded) translate-to-Rust approach thus far hasn't been able to.

[1] https://github.com/duneroadrunner/scpptool/blob/master/appro...

[2] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...


> Profiles cannot achieve the same level of safety as Rust

So the claim is that the scpptool approach[1] can, while remaining closer to traditional C++, and not requiring the introduction of new language elements. Since the scpptool-enforced safe subset of C++ is an actual subset of C++, conforming code continues to build with your existing compiler. It just uses an additional static analyzer to check conformance.

For the 90% or whatever of C++ code that is not actually performance sensitive, the associated SaferCPlusPlus library provides drop-in and "one-to-one" safe replacements for unsafe C++ elements (like standard library containers and raw pointers). (For example, if you're worried about potentially invalid vector iterators, you can just replace your std::vector<>s with mse::mstd::vector<>s.) With these elements, most of the safety is enforced in the type system and not reliant on the static analyzer.

Conforming implementations of performance-sensitive code would be more restricted and more reliant on the static analyzer for safety enforcement. And sometimes requires the use of library elements, like "borrowing objects", which may not have analogies in traditional C++. But overall, even high-performance conforming code remains very recognizable C++.

The claim is that the scpptool approach is a straightforward path to full memory (and data race) safety for C++, and the one that requires the least code migration effort. (And again, as an actual subset of existing C++, not technically dependent on standard committees or compiler vendors for its implementation or deployment.)

[1]: https://github.com/duneroadrunner/scpptool/blob/master/appro...


> From what I'm aware of, Rust has poor ergonomics for programs that have non-hierarchical ownership model (ie. not representable by trees)

Yeah, non-hierarchical references don't really lend themselves to static safety enforcement, so the question is what kind of run-time support the language has for non-hierarchical references. But here Rust has a disadvantage in that its moves are (necessarily) trivial and destructive.

For example, the scpptool-enforced memory-safe subset of C++ has non-owning smart pointers that safely support non-hierarchical (and even cyclical) referencing.

They work by wrapping the target object's type in a transparent wrapper that adds a destructor that informs any targeting smart pointers that the object is about to become invalid (or, optionally, any other action that can ensure memory safety). (You can avoid needing to wrap the target object's type by using a "proxy" object.)

Since they're non-owning, these smart pointers don't impose any restrictions on when/where/how they, or their target objects, are allocated, and can be used more-or-less as drop-in replacements for raw pointers.

Unfortunately, this technique can't be duplicated in Rust. One reason being that in Rust, if an object is moved, its original memory location becomes invalid without any destructor/drop function being called. So there's no opportunity to inform any targeting (smart) pointers of the invalidation. So, as you noted, the options in Rust are less optimal. (Not just "ergonomically", but in terms of performance, memory efficiency, and/or correctness checking.) And they're intrusive. They require that the target objects be allocated in certain ways.

Rust's policy of moves being (necessarily) trivial and destructive has some advantages, but it is not required (or arguably even helpful) for achieving "minimal-overhead" memory safety. And it comes with this significant cost in terms of non-hierarchical references.

So it seems to me that, at least in theory, an enforced memory-safe subset of C++, that does not add any requirements regarding moves being trivial or destructive, would be a more natural progression from traditional C++.


> Yeah, non-hierarchical references don't really lend themselves to static safety enforcement, so the question is what kind of run-time support the language has for non-hierarchical references.

Yes. Back references are a big problem.

I just wrote a bidirectional transitive closure algorithm that uses many back references, with heavy use of Rc, RefCell, Weak, and ".borrow()". It's 100% safe rust. This is the "proper" Rust way to write this sort of thing. The nice thing about doing it the "right" way was that, once it compiled, it needed few changes to work correctly. No mysterious errors at all. But it took a lot of work to get it to compile. Some sections had to be rewritten to get the ownership plumbing right.

I put it up on the Rust forums for comments, and got replies that I should stop doing all that fancy stuff and just use indices into arrays.[1] Or arena allocation. Things that bypass the Rust ownership system. Those approaches would probably have more bugs.

(I'm starting to see a way to do compile time checking for this sort of thing. The basic concept is that run time borrows must be disjoint as to type, disjoint as to scope, or disjoint as to instance. The first is easy. The second requires inspecting the call chain, and there are problems with templates due to ambiguity over what a type parameter does in .borrow() activity. The third is almost a theorem proving problem, but if you restrict compile time checks for disjoint instances to a single function (or maybe a "class", a struct and its functions), it might be manageable. All this might take too much cleverness to use in practice. Too much time getting the ownership plumbing right, even with compiler support. But I should write this up.)

[1] https://users.rust-lang.org/t/bidrectional-transitive-closur...


> I put it up on the Rust forums for comments, and got replies that I should stop doing all that fancy stuff and just use indices into arrays.[1] Or arena allocation. Things that bypass the Rust ownership system. Those approaches would probably have more bugs.

I ran into this years ago as well. It was very unsatisfying. Maybe Rust is just missing a good GC type?


It's not an allocation problem. It's a back-reference problem. When struct A owns struct B, and B needs to be able to find A, that's surprisingly difficult to set up in Rust. Even for structures where B is inside of A's struct.


A couple of solutions in development (but already usable) that more effectively address UB:

i) "Fil-C is a fanatically compatible memory-safe implementation of C and C++. Lots of software compiles and runs with Fil-C with zero or minimal changes. All memory safety errors are caught as Fil-C panics." "Fil-C only works on Linux/X86_64."

ii) "scpptool is a command line tool to help enforce a memory and data race safe subset of C++. It's designed to work with the SaferCPlusPlus library. It analyzes the specified C++ file(s) and reports places in the code that it cannot verify to be safe. By design, the tool and the library should be able to fully ensure "lifetime", bounds and data race safety." "This tool also has some ability to convert C source files to the memory safe subset of C++ it enforces"


Fil-C is interesting because as you'd expect it takes a significant performance penalty to deliver this property, if it's broadly adopted that would suggest that - at least in this regard - C programmers genuinely do prioritise their simpler language over mundane ideas like platform support or performance.

The resulting language doesn't make sense for commercial purposes but there's no reason it couldn't be popular with hobbyists.


Well, you could also treat Fil-C as a sanitiser, like memory-san or ub-san:

Run your test suite and some other workloads under Fil-C for a while, fix any problems report, and if it doesn't report any problems after a while, compile the whole thing with GCC afterwards for your release version.


Right. And of course there are still less-performance-sensitive C/C++ applications (curl, postfix, git, etc.) that could have memory-safe release versions.

But the point is also to dispel the conventional wisdom that C/C++ is necessarily intrinsically unsafe. It's a tradeoff between safety, performance and flexibility/compatibility. And you don't necessarily need to jump to a completely different language to get a different tradeoff.

Fil-C sacrifices some performance for safety and compatibility. The traditional compilers sacrifice some safety for performance and flexibility/compatibility. And scpptool aims to provide the option of sacrificing some flexibility for safety and performance. (Along with the other two tradeoffs available in the same program). The claim is that C++ turns out to be expressive enough to accommodate the various tradeoffs. (Though I'm not saying it's always gonna be pretty :)


Even with UB holes plugged, C (and C++) are still unsafe, because there are many assumptions you might want to make that you can not encode in the language.

To get an example that's easy to understand: before the introduction of the 'const' keyword, you just couldn't express that some variable should never be changed. And no amount of UB sanitisers would have fixed this for you: you just couldn't express the concept. There's lots of other areas of these languages that are still in a similar state.

Eg there's no way to express that a function should be pure, ie not have side effects (but is allowed to use mutation internally).


Yeah, but C++ now supports "user-defined" annotations which effectively allow you to add the equivalent of any keyword you need, right? (Even if it's not the prettiest syntax.) For example, the scpptool static analyzer supports (and enforces) lifetime annotations with similar meaning to Rust's lifetime annotations.

I believe gcc actually does support `__attribute__ ((pure))` to indicate function purity. (I assume it doesn't actually enforce it, but presumably it theoretically could at some point.)


Might I suggest that the scpptool-enforced safe subset of C++ has a better solution for such data structures with cyclic or complex reference graphs, which is run-time checked non-owning pointers [1] that impose no restrictions on how or where the target objects are allocated. Unlike indices, they are safe against use-after-destruction, and they don't require the additional level of indirection either.

[1] https://github.com/duneroadrunner/SaferCPlusPlus#norad-point...


Preach it brother! :)

Hmm, I take it that the situation is that there are a number of vendors/providers/distros/repos who could be distributing your memory-safe builds, but are currently still distributing unsafe builds?

I wonder if an organization like the Tor project [1] would be more motivated to "officially" distribute a Fil-C build, being that security is the whole point of their product. (I'm talking just their "onion router" [2], not (necessarily) the whole browser.)

I could imagine that once some organizations start officially shipping Fil-C builds, adoption might accelerate.

Also, have you talked to the Ladybird browser people? They seemed to be taking an interested in Fil-C.

[1] https://www.torproject.org/

[2] https://gitlab.torproject.org/tpo/core/tor


Tor wants to move to Rust, and they aren't happy with their C codebase. They want to expand use of multi-threading, and C has been too fragile for that.

https://blog.torproject.org/announcing-arti/


Makes sense. But maybe the fact that that post is 4 years old serves to bolster the argument for Fil-C's value proposition. However much people may want to move away from their C code bases, the resources it takes to do so in a timely manner are often not so readily available.


This particular memory vulnerability, as I understand it, was a result of a `ReadonlySpan<>` targeting a resizable vector. A simple technique used by the scpptool-enforced safe subset of C++ to address this situation is to temporarily move the contents of the resizable vector into a non-resizable vector [1] and target the span at the non-resizable vector instead.

Upon destruction, the non-resizable vector will automatically return the contents back to the original resizable vector. (It's somewhat analogous to borrowing a slice in Rust.)

While it wouldn't necessarily prevent you from doing the flawed/buggy thing you were trying to do, it would prevent it from resulting in a memory vulnerability.

[1] https://github.com/duneroadrunner/scpptool#xslta_vector-xslt...


Very interesting, I was not familiar with your project. Thanks for sharing it here!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: