A memory-safe language could have easily resulted in the same vulnerability. In ...

steveklabnik · on Sept 26, 2023

> presumably

This is a big presumption. Yes, it could happen. In practice, doing this isn't even the first tool you'd reach for in this circumstance; the compiler can and will eliminate duplicate bounds checks, so if you've hoisted it early, you shouldn't be using unchecked accesses, even if you care about performance, until you've demonstrated why the compiler isn't okay with removing them. The extra ceremony ("unsafe { foo.get_unchecked(n) {" vs "foo[n]") makes this even simpler to catch in code review.

ninkendo · on Sept 26, 2023

> The extra ceremony ("unsafe { foo.get_unchecked(n) {" vs "foo[n]" makes this even simpler to catch in code review

Right, and in said code review, the webp author could have easily said “yup, we want unsafe here because we already checked up front that the buffer shall not exceed k elements”. Sure it’s easier to see that unchecked access is happening, but when the whole point of large sections of the huffman table code in webp is to make this very thing work, it wouldn’t cause any additional scrutiny in a code review. In other words, it would already be super clear to the reviewer that bounds checking is being disabled for performance sake, seeing the word `unsafe` as ceremony isn’t really adding any information here.

It’s possible webp could have been implemented in rust with a naive approach to early bounds checking and relying on the optimizer to elide it, but having looked at the code, with all the buffer sizes they’re passing around, it looks unlikely that equivalent rust code would have been able to auto-optimize it. I don’t think it’s unlikely at all that, in this parallel universe, they would have found the bounds checking to be a decent enough overhead that they would have reached for unchecked access, and would have passed code review the same way the current code did.

steveklabnik · on Sept 26, 2023

For sure, this absolutely could (and will, at some point) happen. But security isn't only about what can happen, but what will probably happen. Each of these things introduces a possibility to catch the bug, not an impossibility that a bug would happen.

> it wouldn’t cause any additional scrutiny in a code review.

I would hope that it would at least cause a "demonstrate that this is actually necessary" review. People do do this! and sometimes, you do have to use unchecked. A famous example of this is litearally huffman coding, by Dropbox, back in 2016. They ended up providing both bounds checked and unchecked options as a flag, because they actually did measure and demonstrate that things were causing performance drops. I am curious if they'd still have the same issues today, given that a lot of time has passed, and also there were some other factors that would cause me to wonder a bit, but haven't followed up on myself. Regardless, this scenario will end up happening for good reasons at some point, the goal is to get them to happen only when they need to, and not before.

pjmlp · on Sept 27, 2023

I would only accept such PR with profile data and benchmarks proving the point.

ninkendo · on Sept 27, 2023

The dropbox article others and myself have linked provided such benchmarks, for an 11% speedup. Seem plausible?

pjmlp · on Sept 27, 2023

As everything in IT, it depends.

That was for Dropbox specific use case, and 11% seems too much in any case.

For example, what improvements has had rustc in the meantime to better optimize bounds checking, it is still the same, it is less, does it actually matter on the acceptance criteria regarding the definition of done on the story.

pornel · on Sept 26, 2023

> If we were to hypothesize a counterfactual world where webp was written in rust

You don't have to hypothesize. It's here:

https://github.com/image-rs/image/blob/master/src/codecs/web...

and it doesn't use unsafe indexing.

ninkendo · on Sept 26, 2023

That’s not google’s webp library, that’s a totally different project. It doesn’t really matter what that project does.

pornel · on Sept 26, 2023

You’re arguing about a potential vulnerability in a hypothetical code that exists only in your imagination.

Meanwhile the real code that exists doesn’t work like that. It’s not just a matter of personal style. Rust has constructs that help avoid bounds checks. It has design patterns that move unsafe code into smaller, easier to verify helpers. It has a culture of caring about safety – people don’t rewrite projects in Rust to just give up on its biggest selling point.

ninkendo · on Sept 26, 2023

We know that the webp project does use unchecked access. We know this is intentional because they go through a lot of effort to do up front bounds checking rather than at every access. We also know that companies like Dropbox, when implementing their own Huffman table implementation, demonstrated an 11% speed up when disabling bounds checks: https://dropbox.tech/infrastructure/lossless-compression-wit...

It is not a stretch whatsoever to assume that a direct rewrite of the webp library into rust would have uncovered the same perf findings that Dropbox did, and decided that the pattern of “up front bounds checks, disabled bounds checks at element access” is a reasonable perf enhancement, especially for a Huffman table, where you’re continually accessing the compression table over and over when expanding it.

Edit: the more I think about it, you’re probably right. I had mistakenly thought I was reading C++ code when I was looking at the original patch to the vulnerability. The fact that it’s actually written in C, erodes my argument quite a bit… my logic in my head went something like, “if they wanted bounds checking they would have used std::vector”, but now I realize that’s impossible since it’s not C++. I’ll concede that there’s no real reason to assume that they would have skipped bounds checking if written in a safe language.

howinteresting · on Sept 26, 2023

While yes, that's theoretically possible, do you have data to establish this? For example, having small sections of the code be marked as unsafe would allow for greater scrutiny of those sections. Also, unsafe access is more annoying to perform in Rust than in C or C++, so maybe that would have acted as a deterrent (or at least the code would have been profiled to make sure that unsafe access was worth it).

https://security.googleblog.com/2022/12/memory-safe-language... shows improvements at scale.

ninkendo · on Sept 26, 2023

Elsewhere in the comments someone linked to this: https://dropbox.tech/infrastructure/lossless-compression-wit...

It looks like dropbox experimented with disabling bounds checks in their huffman coding impl, and found that using the unsafe pattern increased throughput from 224 MB/s to 249 MB/s (11%-ish faster.) We don’t even need to hypothesize about whether webp would have elminated bounds checking, we can see that other companies arrived at the same conclusion: Disabling it can be worth it if you’re quite sure you’ve gotten the up-front checking right. We can imagine that if Dropbox went to prod with the unchecked huffman implementation (never mind that that article isn’t about webp in particular), we could imagine they could easily have the same bug. And I don’t think a naive code review saying “unsafe is bad” would have stopped them from doing it: they clearly did the work to show why it’s worth it.

hot_gril · on Sept 26, 2023

The risk probably doesn't matter for their use case because they're doing all this datacenter-scale image conversion on processes separate from the main logic (and likely not even on the same machines). Unlike in a phone's web browser or something, where it's 11% speedup with ??% added risk.

It's already common for PC apps to split potentially unsafe rendering into subprocesses, like in Chrome. If you don't want to pay the full IPC toll, there's shared memory. In theory should be about the same speed as inlined unsafe code, right? What if Rust's "unsafe" blocks could do this for you?

howinteresting · on Sept 26, 2023

Thank you.

e93849 · on Sept 26, 2023

Unsafe rust is not a memory-safe language.

vacuity · on Sept 26, 2023

It isn't, but there's a big difference between writing straight C and writing some unsafe Rust with safe wrappers. It's generally possible to have comparably few lines of unsafe Rust that do the heavy lifting and can be verified to maintain memory safety without sacrificing performance.

hot_gril · on Sept 26, 2023

Well, the code you mark unsafe is probably also the most complicated piece and thus the most likely to have a bug. I can trust decent C devs to write their basic logic safely, just not the hyper-optimized portions.

pornel · on Sept 26, 2023

A big blob of complex unsafe code is the opposite of how Rust devs approach unsafe optimizations.

Rust has a pattern of isolating unsafety into small components behind a safe interface, so that the component can be understood and tested in isolation. For example, if you need some adventurous pointer arithmetic, you write an Iterator for it, rather than do it in the middle of a complex algorithm. This way the complicated logic can be in safe code.

It's sort of like Lego, where you build from safe higher-level blocks, but you can design custom blocks if you need.

hot_gril · on Sept 26, 2023

Likewise, C code will be organized into separate pieces with a few small super-optimized/complicated parts, and they can fuzz-test the complicated pieces that are most likely to have buffer overflows.

I can't find the actual code causing the libwebp vulnerability, so idk if mixed safe/unsafe Rust code would've been any better here. Maybe what we really need is an "unsafe-jail" block in Rust that uses a child process limited to a piece of shared mem, and you put big pieces in there to avoid overhead. Like, libwebp can screw up all it wants, just don't touch the rest of my app.

pornel · on Sept 26, 2023

The difference is that in C you can't make a "safe" interface, which the compiler will enforce is used properly.

C's type system is not nearly expressive enough for this. You can barely declare a pointer non-null with extensions, but you can't express ownership. You can't force callers of your function to check for error before using the returned value. You can't force correct lifecycle of objects (e.g. Rust can have methods that can be called once, and no more, and then statically forbid further uses of the object. Great for cleanup without double-free.)

C doesn't have ability to turn off Undefined Behavior for a section of code. You can't just forbid dangling pointers.

For a foolproof API the best you can do is use handles and opaque objects, and a ton of run-time defenses. But in Rust that is unnecessary. Use of the API can be guaranteed safe, and a lot of that is guaranteed at compile time, with zero run-time overhead.

For example, a mutable slice in Rust (a buffer) has a guarantee that the pointer is not null, is a valid allocation for the length of the slice, is aligned, points to initialized memory, is not aliased with any other pointer, and will not be freed for as long as you use it. And the compiler enforces that the safe Rust code can't break these guarantees, even if it's awful code written by the most incompetent amateur while drunk. And at run time this is still just a pointer and a length.

In C you don't get this compartmentalization and low- or zero-overhead layering. Instead of unsafe + actually enforced safe, you have unsafe + more unsafe.

hot_gril · on Sept 27, 2023

Right, but I can trust a decent C developer to use it safely in the simple parts, especially with tooling like valgrind to detect obvious bugs. The only part where I'd say the usual "nobody is perfect" is in the hard parts.

pornel · on Sept 27, 2023

There's 40 years history of trying, and it doesn't work.

These decent C programmers are like True Scotsmen. When top software companies keep getting pwned, even in their most security-sensitive projects, it's because they hire crap programmers.

Even basic boring C can be exploitable. Android was hit by an integer overflow in `malloc(items * size)` (stagefright). Mozilla's NSS had vulnerability due to a wrong buffer size, which fuzzing did not catch (BigSig).

hot_gril · on Sept 27, 2023

After looking at Stagefright... yes, I've lost faith in the ability to write safe C code.

vacuity · on Sept 26, 2023

At that point, I imagine the overhead will motivate that bounds checks may as well be enabled as the alternative.

hot_gril · on Sept 27, 2023

The only overhead I see in theory is the additional process's kernel resources, which are negligible. Wherever you'd normally write to memory for some unsafe code to mutate, you instead write to the shared memory. Kernel is doing the same virtual mem protection either way, only in this case it opens up access to the child process too.

Am I missing something else, like shared mem being slower? Maybe the inability to share CPU caches?

vacuity · on Sept 27, 2023

The fact that you have a whole other process, was my line of thinking. If the scheduling doesn't play nice then latency will suffer. I don't really know in practice though.

hot_gril · on Sept 27, 2023

Latency is an issue with file or network-based IPC, but shared memory is supposed to be the same speed as non-shared. Apparently the X window system relies on it a lot.

Scheduling, I dunno. Would imagine it's not bad as long as you don't spawn a ton of these.