This is overly complicated, there is no need to bring Rust semantics to C to ens...

Voultapher · on Feb 20, 2024

Compiler optimizations and other forms of UB like integer overflow would like a word with you. If it were that simple, someone would have had success at scale by now https://alexgaynor.net/2020/may/27/science-on-memory-unsafet....

ActorNightly · on Feb 20, 2024

>If it were that simple, someone would have had success at scale by now

A lot of code in that article doesn't use mempools, and furthermore, just because a double free exists doesn't mean that its always exploitable. And if its exploitable, it doesn't mean that you can gain a shell or even exfil data, sometimes it means you can just crash the program.

Fundamentally, if you write a wrapper around memory management that keeps track of allocated resources, much in the same way how rust includes some runtime code during compilation for memory safety, you gain the same functionality.

Voultapher · on Feb 20, 2024

> Fundamentally, if you write a wrapper around memory management that keeps track of allocated resources, much in the same way how rust includes some runtime code during compilation for memory safety, you gain the same functionality.

Can you substantiate that? There are commonly employed tracking allocators, such as ASAN that can catch certain kinds of UB, and UBSAN other, and with special interpreters you can catch even more. But even basic ASAN is more exhaustive than what you are suggesting, and it provably can't provide the same guarantees that safe and sound Rust gives you https://stackoverflow.com/a/48902567:

> And that is not accounting for the fact that sanitizers are incompatible with each others. That is, even if you were willing to accept the combined slow-down (15x-45x?) and memory overhead (15x-30x?), you would still NOT manage for a C++ program to be as safe as a Rust one.

Also, I think you misunderstand the way Rust works, it does compile-time ownership checking, which allows it to avoid run-time checking, so this part "same way how rust includes some runtime code during compilation for memory safety" is factually wrong.

thradams · on Feb 20, 2024

Rust needs to add some runtime checks when calling destructors in scenarios where some object may or may not be moved.

In C++ for instance, for smart pointers, the destructor will have a "if p!= NULL". Then if the smart pointer was moved, it makes the pointer null and the destructor checks at runtime for it.

ActorNightly · on Feb 20, 2024

>so this part "same way how rust includes some runtime code during compilation for memory safety" is factually wrong.

RefCell includes runtime code. Fundamentally, because of Rice Theorem, the compiler cannot predict the state of memory at all points in time, so runtime checks are needed.

>Can you substantiate that?

I mean, double free relies on using free() twice. Mempool malloc()'s once, and free()'s once at exit. Use after free is mitigated by making sure that the pointer to the memory is set to zero (mempool either returns struct or a pointer to a pointer on allocation, and you access the requested memory through that).

Furthermore, you can have multiple mempools, and keep critical data separate, so if the pointer doesn't get zeroed out in the implementation, use after free won't leak anything critical.

Voultapher · on Feb 20, 2024

Is anyone using this at scale and having success with avoiding all memory safety problems, just use mempool trust me bro. I made a copy of the thing the mempool pointer points to and that wasn't zeroed by free, I now have UAF, just use mempool trust me bro. I was using C for performance, I now have double pointer indirection everywhere, just use mempool trust me bro. I went out-of-bounds, just use mempool trust me bro. I violated strict aliasing, just use mempool trust me bro. I violated pointer provenance, just use mempool trust me bro. My program uses more than one thread, just use mempool trust me bro.

LOL

jart · on Feb 20, 2024

malloc() already keeps track of every memory allocation. Just what kind of tracking are we talking about here?

ActorNightly · on Feb 20, 2024

malloc doesn't keep track accurately in all cases, which is why double free is possible in the first place.

with mempool implementation, you shouldn't be able to release a previously released chunk because the pointer to that chunk will be zeroed out. This requires one more level of indirection in accessing the memory, i.e pointer to a struct that contains the pointer to the memory, but is otherwise safe.

As of note that may be causing some confusion, im not referencing the standard linux mempool implementation. I have written custom ones with a lot of helper functions for safe memory access.

jart · on Feb 20, 2024

What is mempool? Is that this? https://en.wikipedia.org/wiki/Region-based_memory_management

thradams · on Feb 20, 2024

mempool does not solve double free, use after free (at least at compile time) or fopen sample. But mempool and ownership can be complementary.

ActorNightly · on Feb 20, 2024

If you are talking about a very naive version of mempool, then you are correct, but thats why I said a good implementation.

The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.

User after free is mitigated in the same way. When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.

lmm · on Feb 20, 2024

> If you are talking about a very naive version of mempool, then you are correct, but thats why I said a good implementation.

No true Scotsman.

> The whole point of a good mempool is that you malloc once, and only call free when you exit the program. The data structures for memory allocation will never get corrupted. And the memory pool will never release chunk twice cause it keeps tracks of allocated chunks.

Then you've just moved the same problem one layer up - "use after returned to mempool" takes the place of "use after free" and causes the same kind of problems.

> When you allocate, you get a struct back that contains a pointer to the data. When you release, that pointer is zeroed out.

And the program - or, more likely, library code that it called - still has a copy of that pointer that it made when it was valid?

ActorNightly · on Feb 20, 2024

Its not about comparing implementations, its about the fact that a correct mempool implementation solves the problem without need for complex borrow checkers.

For example, in that implementation, you request memory from a mempool, it returns a chunk-struct with the pointer to allocated memory, the size of the chunk, and optionally some convenience functions for safe access (making sure that the pointer is not incremented or decremented beyond the limits). It also keeps its own pointer to the chunk-struct, along with the chunk that it was allocated. When you release the chunk, it zeros out the pointer in the chunk-struct. Now any access to it will cause a segfault.

You can of course write code that bypasses all those checks, but in Rust, thats equivalent to using unsafe when you wanna be lazy. Also you could argue that Rust is better because instead of segfaulting, the check will be caught during compile time, which is true but only for fairly simple programs. Once you start using RefCells, you cannot guarantee everything during compile time.

lmm · on Feb 20, 2024

> You can of course write code that bypasses all those checks, but in Rust, thats equivalent to using unsafe when you wanna be lazy.

The difference is that most of the Rust ecosystem is set up to allow you to not use unsafe. Whereas whenever you use a library in C, you need to pass it a pointer, so bypassing these checks has to be routine. (Note that the article claims as a key merit that it's possible to add annotations to existing libraries)

> When you release the chunk, it zeros out the pointer in the chunk-struct. Now any access to it will cause a segfault.

Only if you're very lucky. Null pointer dereference is undefined behaviour, so it may cause a different thread to segfault on a seemingly unrelated line, or your program may silently continue with subtly corrupted state in memory, or...

> Also you could argue that Rust is better because instead of segfaulting, the check will be caught during compile time, which is true but only for fairly simple programs. Once you start using RefCells, you cannot guarantee everything during compile time.

Using RefCells should be (and, idiomatically, is) the exception rather than the rule. And incorrect use of RefCell results in a safe panic rather than undefined behaviour.

ActorNightly · on Feb 21, 2024

Null pointer dereference in the vast majority of cases will segfault. In the cases where it doesn't, thats fully on you for running some obscure os on some obscure hardware.

>Whereas whenever you use a library in C, you need to pass it a pointer,

When it comes to developing with Rust, any performance oriented project is necessarily going to have lots of unsafe for interacting with C libraries in the linux kernel in the same way that C code does.

As for comparison to fully safe Rust code outside the unsafes, you can largely accomplish analogous behavior in C with good mempool implementation. Or if you don't need to pass around huge amount of data, you can also do it by simply just never mallocing and using stack variables. There is still some things you have to worry about (using safe length bounded memory copy/move functions, using [type]* const pointer values to essentially make them act like references for function parameters, some other small things).

The point is Rust isn't the defacto standard for memory safety, and while it can exist as its own project, porting its semantics to other languages is not worth it.

lmm · on Feb 22, 2024

> Null pointer dereference in the vast majority of cases will segfault.

Attempting access to a zero address will segfault on most hardware, but unfortunately common C compilers in common configurations will not reliably compile a null pointer dereference to an access to the zero address. Look up why the Linux kernel builds with -fno-delete-null-pointer-checks (sadly, most applications and libraries don't).

> When it comes to developing with Rust, any performance oriented project is necessarily going to have lots of unsafe for interacting with C libraries in the linux kernel in the same way that C code does.

I'm not talking about performance oriented projects. I'm talking about regular use of libraries e.g. I need to talk to PostgreSQL so I'll call libpq, I need to uncompress some data so I'll use zlib, I need to make a HTTP call so I'll use libcurl...

> The point is Rust isn't the defacto standard for memory safety

It absolutely is though. It's got clear, easy-to-assess rules for whether a project is memory-safe or not, and a substantial ecosystem that follows them; so far it's essentially unique in that unless you include GCed languages.

ActorNightly · on Feb 22, 2024

I mean you just proved your own point - compile with -fno-delete-null-pointer-checks.

And whatever criticism is you have of that is surpassed by the fact in all cases for regular software (i.e run on a server or laptop or desktop) that would be normal to write in either Rust or C, if it was written in C, and a null pointer is dereferences, it would absolutely crash (i.e Rust is not really being used to develop embedded system software code in non experimental workflows where zero address is a valid memory address).

And whatever criticism you have of that is surpassed by the fact that if you can write Rust code with all the borrowing semantics, you can also write a quick macro for any dereference of a mempool region that checks if the pointer is null and use that everywhere in your code.

So TLDR, not hard to write memory safe code. Rust is just a way to do it, but not the only way. Its great for enterprise projects, much in the same way that Java came up because of its strictness, GC and multi platform capability. And just like Java today, eventually nobody is going to take it seriously, people who want to get shit done will be writing something that looks like python except even higher level, with ai assistants that replace text, and then LLMs will translate that code into the most efficient machine code.

lmm · on Feb 22, 2024

> compile with -fno-delete-null-pointer-checks

Most people don't though. Even if your code was compiled with it, libraries you use may not have been compiled that way. And even if you do, it doesn't cover all cases.

> And whatever criticism is you have of that is surpassed by the fact in all cases for regular software (i.e run on a server or laptop or desktop) that would be normal to write in either Rust or C, if it was written in C, and a null pointer is dereferences, it would absolutely crash

No it won't. Not reliably, not consistently. It's undefined behaviour, so a C compiler can do random other things with your code, and both GCC and Clang do.

> And whatever criticism you have of that is surpassed by the fact that if you can write Rust code with all the borrowing semantics, you can also write a quick macro for any dereference of a mempool region that checks if the pointer is null and use that everywhere in your code.

"Everywhere in your code" only if you're not using any libraries.

> So TLDR, not hard to write memory safe code.

If it's that easy why has no-one done it? Where can I find published C programs written this way? Like most claims of "safe C", this is vaporware.

ActorNightly · on Feb 22, 2024

>It's undefined behaviour, so a C compiler can do random other things with your code, and both GCC and Clang do.

Give me an example of a null pointer dereference in a program that one compiles -with -fdelete-null-pointer-checks that doesn't crash when its run on any smartphone, x64 cpu in modern laptops/desktops/servers or Apple Silicon.

lmm · on Feb 25, 2024

> Give me an example of a null pointer dereference in a program that one compiles -with -fdelete-null-pointer-checks that doesn't crash when its run on any smartphone, x64 cpu in modern laptops/desktops/servers or Apple Silicon.

https://blog.llvm.org/2011/05/what-every-c-programmer-should... has an example under "Debugging Optimized Code May Not Make Any Sense" - in that case the release build fortuitously did what the programmer wanted, but the same behaviour could easily cause disaster (e.g. imagine you have two different global "init" functions and your code is set up to call one or other of them depending on some settings or something, and you forget to set one of your global function pointers in one of those init functions. Now instead of crashing, calls via that global function pointer will silently call the wrong version of the function).

thradams · on Feb 21, 2024

Cake is not porting Rust semantics. It works on classical C code, like the first sample using fopen.

    #include <ownership.h>
    #include <stdio.h>

    int main()
    {
      FILE *owner f = fopen("file.txt", "r"); 
      if (f)
        fclose(f);
    }

But comparisons are inevitable, and I also think there are lessons learned in Rust.

C programmers uses contracts, these contracts are part of documentation of some API. For instance, if you call fopen you must call fclose.

All we need is to create contracts that the compiler can read and verify automatically.

jart · on Feb 20, 2024

> The whole point of a good mempool is that you malloc once, and only call free when you exit the program

So you're describing fork() and _exit(). That's my favorite memory manager. For example, chibicc never calls free() and instead just forks a process for each item of work in the compile pipeline. It makes the codebase infinitely simpler. Rui literally solved memory leaks! No idea what you're talking about.

thradams · on Feb 20, 2024

One issue I see with this approach (compiler leaking memory) is, for instance, if the requirements change and you need to utilize the compiler as a lib or service. For example, if the Cake source is used within a web browser compiled with Emscripten, leaking memory with each compilation would lead to a continuous increase in memory usage.

Additionally, compilers often offer the option to compile multiple files. Therefore, we cannot afford to leak memory with each file compilation.

Initially I was planning a global allocator for cake source. It had a lot of memory leaks that would be solved in the future.

When ownership checks were added it was a perfect candidate for fixing leaks. (actually I also had this in mind)

jart · on Feb 20, 2024

True, but with some stuff you just ain't gonna need it. For example, chibicc forks a process for each input file. They're all ephemeral. So the fork/_exit model does work well for chibicc. You could compile a thousand files and all its subprocesses would just clean things up. Now needless to say, I have compiled some juicy files with chibicc. Memory does get a bit high. It's manageable though. I imagine it'd be more of an issue if it were a c++ compiler.

thradams · on Feb 20, 2024

(I think preprocessor is the place where memory is used and released all the time while expanding macros.)

jart · on Feb 20, 2024

It is.