You can charitably read it as "MBA from Stanford, with a focus on computer science-related stuff," or maybe "MBA and a bachelor's in CS from Stanford." Or you could assume that it's an MS in CS that was 'autocorrected' to MBA.
But the way it's phrased and worded... at best, it's the kind of really bad typo that shows rank incompetence; at worst, it's outright fabrication that is actively lying about the credentials; and what I think most likely, it's obfuscation that's relying on credentialism to impart an imprimatur of credibility that is wholly undeserved (i.e. "I got an unrelated degree at Stanford, but it's Stanford and how could anyone who goes there be bad at CS?").
Metals, especially aluminum, are useful enough to recycle that it's sometimes worth extracting them from the municipal waste stream (this is a no-brainer if your waste is incinerated, rather than sent to a landfill directly).
Glass, plastic, and paper are generally at best marginal for recycling, especially because they can be sensitive to contamination in the recycling process (oops, somebody threw a greasy pizza box in the recycling!). Glass and some kinds of plastic products work really well for reuse rather than recycling, but a municipal recycling stream isn't conducive to reuse; you're probably more likely to see them ground up and 'recycled' as some kind of aggregate. For plastic, I'd expect that just about only a plastic water bottle or the like is close to practicably recyclable.
And this is where I wish local collection agencies and companies focused on. Be clear. That paper, throw in the trash. Collect metals, glass if it’s feasible because you are close to a glass manufacturer. But nothing else.
No, they couldn't. What they could do--and what they did do--was push for the move of TLS connections for the MX-MX hop of email; I don't have the stats off the top of my head for how prevalent that is, but I think it's in the 80-90% range of email being delivered in this method.
But end-to-end encrypted email? It breaks everything. You need to get all the MUAs to support it (very few do either S/MIME or PGP). You'll break webmail--the most popular way to use email--without lots of major investment. And encrypted email breaks things like spam filtering or server-side filters catastrophically. Key discovery is also unsolved.
There was a time when I was on the everybody-should-use-encrypted-email train. But I've since grown up and realized that encrypted email fundamentally breaks email in ways that people are unprepared for, and people have already figured out how to route around the insecurity of email via other mechanisms.
I'm far from an expert in this field--indeed, I can but barely grasp the gentle introductions to these topics--but my understanding is that calling string theory a "theory of everything" really flatters it. String theory isn't a theory; it's a framework for building theories. And no one (to my understanding) has been able to put forward a theory using string theory that can actually incorporate the Standard Model and General Relativity running in our universe to make any prediction in the first place, much less one that is testable.
Getting into the weeds about what is and is not "A Theory" is an armchair scientist activity, it's not a useful exercise. Nobody in the business of doing physics cares or grants "theory status" to a set of models or ideas.
Some physicists have been trying to build an updated model of the universe based on mathematical objects that can be described as little vibrating strings. They've not been successful in closing the loop and constructing a model that actually describes reality accurately, but they've done a lot of work that wasn't necessarily all to waste.
It's probably either just the wrong abstraction or missing some fundamental changes that would make it accurate.
It would also be tremendously helpful if we had some new physics where there was a significant difference between an experiment and either GR or the standard model. Unfortunately the standard model keeps being proven right.
It's worth reflecting on the fact that for most of human history, sea travel is easier and faster than land travel. That's one of the main reasons why major towns and cities are centered on river access.
do you know how far madagascar is from easter island? if you're talking about mediterranean and river travel, yes you're right. but the pacific ocean + indian ocean are utterly massive.
I believe more trade between China and the Mediterranean was transited via Indian Ocean trade routes than via the traditional Silk Road, though I'm hard-pressed to find actual statistics.
There is a set of languages which are essentially required to be available on any viable system. At present, these are probably C, C++, Perl, Python, Java, and Bash (with a degree of asterisks on the last two). Rust I don't think has made it through that door yet, but on current trends, it's at the threshold and will almost certainly step through. Leaving this set of mandatory languages is difficult (I think Fortran, and BASIC-with-an-asterisk, are the only languages to really have done so), and Perl is the only one I would risk money on departing in my lifetime.
I do firmly expect that we're less than a decade out from seeing some reference algorithm be implemented in Rust rather than C, probably a cryptographic algorithm or a media codec. Although you might argue that the egg library for e-graphs already qualifies.
We're already at the point where in order to have a "decent" desktop software experience you _need_ Rust too. For instance, Rust doesn't support some niche architectures because LLVM doesn't support them (those architectures are now exceedingly rare) and this means no Firefox for instance.
> There is a set of languages which are essentially required to be available on any viable system. At present, these are probably C, C++, Perl, Python, Java, and Bash
Java, really? I don’t think Java has been essential for a long time.
Even Perl... It's not in POSIX (I'm fairly sure) and I can't imagine there is some critical utility written in Perl that can't be rewritten in Python or something else (and probably already has been).
As much as I like Java, Java is also not critical for OS utilities. Bash shouldn't be, per se, but a lot of scripts are actually Bash scripts, not POSIX shell scripts and there usually isn't much appetite for rewriting them.
I am yet to find one without it, unless you include Windows, consoles, or table/phone OSes, embedded RTOS, which aren't anyway proper UNIX derivatives.
C currently remains the language of system ABIs, and there remains functionality that C can express that Rust cannot (principally bitfields).
Furthermore, in terms of extensions to the language to support more obtuse architecture, Rust has made a couple of decisions that make it hard for some of those architectures to be supported well. For example, Rust has decided that the array index type, the object size type, and the pointer size type are all the same type, which is not the case for a couple of architectures; it's also the case that things like segmented pointers don't really work in Rust (of course, they barely work in C, but barely is more than nothing).
That first sentence though. Bitfields and ABI alongside each other.
Bitfield packing rules get pretty wild. Sure the user facing API in the language is convenient, but the ABI it produces is terrible (particularly in evolution).
I would like a revision to bitfields and structs to make them behave the way a programmer things, with the compiler free to suggest changes which optimize the layout. As well as some flag that indicates the compiler should not, it's a finalized structure.
Can you expand on bitfields? There’s crates that implement bitfield structs via macros so while not being baked into the language I’m not sure what in practice Rust isn’t able to do on that front.
Now, try and use two or more libraries that expose data structures with bitfields, and they have all chosen different crates for this (or even the same crate but different, non-ABI-compatible-versions of it).
There's a ton of standardization work that really should be done before these are safe for library APIs. Mostly fine to just write an application that uses one of these crates though.
I'm not a rust or systems programmer but I think it meant that as an ABI or foreign function interface bitfields are not stable or not intuitive to use, as they can't be declared granularily enough.
C's bit-fields ABI isn't great either. In particular, the order of allocation of bit-fields within a unit and alignment of non-bit-field structure members are implementation defined (6.7.2.1). And bit-fields of types other than `_Bool`, `signed int` and `unsigned int` are extensions to the standard, so that somewhat limits what types can have bitfields.
Dynamic linking is not something you do in general in Rust. It's possible, but the compiler currently does not guarantee a stable ABI so it's not something one generally does.
I'm genuinely surprised that usize <=> pointer convertibility exists. Even Go has different types for pointer-width integers (uintptr) and sizes of things (int/uint). I can only guess that Rust's choice was seen as a harmless simplification at the time. Is it something that can be fixed with editions? My guess is no, or at least not easily.
There is a cost to having multiple language-level types that represent the exact same set of values, as C has (and is really noticeable in C++). Rust made an early, fairly explicit decision that a) usize is a distinct fundamental type from the other types, and not merely a target-specific typedef, and b) not to introduce more types for things like uindex or uaddr or uptr, which are the same as usize on nearly every platform.
Rust worded in its initial guarantee that usize was sufficient to roundtrip a pointer (making it effectively uptr), and there remains concern among several of the maintainers about breaking that guarantee, despite the fact that people on the only target that would be affected basically saying they'd rather see that guarantee broken. Sort of the more fundamental problem is that many crates are perfectly happy opting out of compiling for weirder platform--I've designed some stuff that relies on 64-bit system properties, and I'd rather like to have the ability to say "no compile for you on platform where usize-is-not-u64" and get impl From<usize> for u64 and impl From<u64> for usize. If you've got something like that, it also provides a neat way to say "I don't want to opt out of [or into] compiling for usize≠uptr" and keeping backwards compatibility.
> ...not to introduce more types for things like uindex or uaddr or uptr, which are the same as usize on nearly every platform. ... there remains concern among several of the maintainers about breaking that guarantee, despite the fact that people on the only target that would be affected basically saying they'd rather see that guarantee broken.
The proper approach to resolving this in an elegant way is to make the guarantee target-dependent. Require all depended-upon crates to acknowledge that usize might differ from uptr in order to unlock building for "exotic" architectures, much like how no-std works today. That way "nearly every platform" can still rely on the guarantee with no rise in complexity.
I brought up Go because it was designed around the same time and, while it gets a lot of flack for some of its other design decisions, this particular one seems prescient. However, I would be remiss if I gave the impression that the reasoning behind the decision was anticipation of some yet unseen future; the reality was that int and uint (which are not aliases for sized intN or uintN) were not initially the same as ptrdiff_t and size_t (respectively) on all platforms. Early versions of Go for 64-bit systems had 32-bit int and uint, so naturally uintptr had to be different (and it's also not an alias). It was only later that int and uint became machine-word-sized on all platforms and so made uintptr seem a bit redundant. However, this distinction is fortuitous for CHERI etc. support. Still, Go on CHERI with 128-bit uintptr might break some code, however such code is likely in violation of the unsafe pointer rules anyway: https://pkg.go.dev/unsafe#Pointer
Yet Rust is not Go and this solution is probably not the right one for Rust. As laid out in a link on a sibling comment, one possibility is to do away with pointer <=> integer conversions entirely, and use methods on pointers to access and mutate their addresses (which may be the only thing they represent on some platforms, but is just a part of their representation on others). The broader issue is really about evolving the language and ecosystem away from the mentality that "pointers are just integers with fancy sigil names".
I'd say, that even more than pointer sizes, the idea that a pointer is just a number really needs to die, and is in no way a forward looking decision expected of a modern language.
Pointers should at no point be converted into numbers and back as that trips up many assumptions (special runtimes, static/dynamic analysis tools, compiler optimizations).
Additionally, I would make it a priority that writing FFIs should be as easy as possible, and requires as little human deliberation as possible. Even if Rust is safe, its safety can only be assumed as long as the underlying external code upholds the invariants.
Which is a huge risk factor for Rust, especially in today's context of the Linux kernel. If I have an object created/handled by external native code, how do I make sure that it respects Rust's lifetime/aliasing rules?
What's the exact list of rules my C code must conform to?
Are there any static analysis/fuzzing tools that can verify that my code is indeed compliant?
C doesn't require convertibility to an integer and recognizes that pointers may have atypical representations. Casting to integer types has always been implementation defined. [u]intptr_t is optional specifically to allow such platforms to claim standard conformance.
> Which is a huge risk factor for Rust, especially in today's context of the Linux kernel. If I have an object created/handled by external native code, how do I make sure that it respects Rust's lifetime/aliasing rules?
Can you expand on this point? Like are you worried about whether the external code is going to free the memory out from under you? That is part of a guarantee, the compiler cannot guarantee what happens at runtime no matter what the author of a language wants, the CPU will do what it's told, it couldn't care about Rusts guarantees even if you built your code entirely with rust.
When you are interacting with the real world and real things you need to work with different assumptions, if you don't trust that the data will remain unmodified then copy it.
No matter how many abstractions you put on top of it there is still lighting in a rock messing with 1s and 0s.
> Is it something that can be fixed with editions? My guess is no, or at least not easily.
Assuming I'm reading these blog posts [0, 1] correctly, it seems that the size_of::<usize>() == size_of::<*mut u8>() assumption is changeable across editions.
Or at the very least, if that change (or a similarly workable one) isn't possible, both blog posts do a pretty good job of pointedly not saying so.
Personally, I like 3.1.2 from your link [0] best, which involves getting rid of pointer <=> integer casts entirely, and just adding methods to pointers, like addr and with_addr. This needs no new types and no new syntax, though it does make pointer arithmetic a little more cumbersome. However, it also makes it much clearer that pointers have provenance.
I think the answer to "can this be solved with editions" is more "kinda" rather than "no"; you can make hard breaks with a new edition, but since the old editions must still be supported and interoperable, the best you can do with those is issue warnings. Those warnings can then be upgraded to errors on a per-project basis with compiler flags and/or Cargo.toml options.
> Personally, I like 3.1.2 from your link [0] best, which involves getting rid of pointer <=> integer casts entirely, and just adding methods to pointers, like addr and with_addr. This needs no new types and no new syntax, though it does make pointer arithmetic a little more cumbersome. However, it also makes it much clearer that pointers have provenance.
Provenance-related work seems to be progressing at a decent pace, with some provenance-related APIs stabilized in Rust 1.84 [0, 1].
This feels antithetical to two of the main goals of editions.
One of those goals is that code which was written for an older edition will continue to work. You should never be forced to upgrade editions, especially if you have a large codebase which would require significant modifications.
The other goal is that editions are interoperable, i.e. that code written for one edition can rely on code written for a different edition. Editions are set on a per-crate basis, this seems to be the case for both rustc [1] and of course cargo.
As I see it, what you're saying would mean that code written for this new edition initially couldn't use most of the crates on crates.io as dependencies. This would then create pressure on those crates' authors to update their edition. And all of this would be kind of pointless fragmentation and angst, since most of those crates wouldn't be doing funny operations on pointers anyway. It might also have the knock-on effect of making new editions much more conservative, since nobody would want to go through that headache again, thus undermining another goal of editions.
If the part about how "most of those crates wouldn't be doing funny operations on pointers" can be verified automatically in a way that preserves safety guarantees when usize != uaddr/uptr, these crates can continue to build without transitioning to a newer edition. Otherwise, upgrading these crates is the right move. Other code targeting earlier editions of Rust would still build, they would simply need a compiler version update when depending on the newly-upgraded crates.
> while in standard C there is no way to express that.
In ISO Standard C(++) there's no SIMD.
But in practice C vector extensions are available in Clang and GCC which are very similar to Rust std::simd (can use normal arithmetic operations).
Unless you're talking about CPU specific intrinsics, which are available to in both languages (core::arch intrinsics vs. xmmintrin.h) in all big compilers.
In what architecture are those types different? Is there a good reason for it there architecturally, or is it just a toolchain idiosyncrasy in terms of how it's exposed (like LP64 vs. LLP64 etc.)?
CHERI has 64-bit object size but 128-bit pointers (because the pointer values also carry pointer provenance metadata in addition to an address). I know some of the pointer types on GPUs (e.g., texture pointers) also have wildly different sizes for the address size versus the pointer size. Far pointers on segmented i386 would be 16-bit object and index size but 32-bit address and pointer size.
There was one accelerator architecture we were working that discussed making the entire datapath be 32-bit (taking less space) and having a 32-bit index type with a 64-bit pointer size, but this was eventually rejected as too hard to get working.
I guess today, instead of 128bit pointers we have 64bit pointers and secret provenance data inside the cpu, at least on the most recent shipped iPhones and Macs.
In the end, I’m not sure that’s better, or maybe we should have had extra large pointers again (in that way back 32bit was so large we stuffed other stuff in there) like CHERI proposes (though I think it still has secret sidecar of data about the pointers).
Would love to Apple get closer to Cheri. They could make a big change as they are vertically integrated, though I think their Apple Silicon for Mac moment would have been the time.
You mean in safe rust? You can definitely do self-referential structs with unsafe and Pin to make a safe API. Heck every future generated by the compiler relies on this.
Sorry but what have I said wrong? The nature of code written in kernel development is such that using unsafe is inevitable. Low-level code with memory juggling and patterns that you usually don't find in application code.
And yes, I have had a look into the examples - maybe one or two years there was a significant patch submitted to the kernel and number of unsafe sections made me realize at that moment that Rust, in terms of kernel development, might not be what it is advertised for.
Right? Thank you for the example. Let's first start by saying the obvious - this is not an upstream driver but a fork and it is also considered by its author to be a PoC at best. You can see this acknowledged by its very web page, https://rust-for-linux.com/nvme-driver, by saying "The driver is not currently suitable for general use.". So, I am not sure what point did you try to make by giving something that is not even a production quality code?
Now let's move to the analysis of the code. The whole code, without crates, counts only 1500 LoC (?). Quite small but ok. Let's see the unsafe sections:
rnvme.rs - 8x unsafe sections, 1x SyncUnsafeCell used for NvmeRequest::cmd (why?)
nvme_mq/nvme_prp.rs - 1x unsafe section
nvme_queue.rs - 6x unsafe not sections but complete traits
nvme_mq.rs - 5x unsafe sections, 2x SyncUnsafeCell used, one for
IoQueueOperations::cmd second for AdminQueueOperations::cmd
In total, this is 23x unsafe sections/traits over 1500LoC, for a driver that is not even a production quality driver. I don't have time but I wonder how large this number would become if all crates this driver is using were pulled in into the analysis too.
The idea behind the safe/unsafe split is to provide safe abstractions over code that has to be unsafe.
The unsafe parts have to be written and verified manually very carefully, but once that's done, the compiler can ensure that all further uses of these abstractions are correct and won't cause UB.
Everything in Rust becomes "unsafe" at some lower level (every string has unsafe in its implementation, the compiler itself uses unsafe code), but as long as the lower-level unsafe is correct, the higher-level code gets safety guarantees.
This allows kernel maintainers to (carefully) create safe public APIs, which will be much safer to use by others.
C doesn't have such explicit split, and its abstraction powers are weaker, so it doesn't let maintainer create APIs that can't cause UB even if misused.
> I am not sure what point did you try to make by giving something that is not even a production quality code?
let's start by prefacing that 'production quality' C is 100% unsafe in Rust terms.
> Sorry, I am not buying that argument.
here's where we fundamentally disagree: you listed a couple dozen unsafe places in 1.5kLOC of code; let's be generous and say that's 10% - and you're trying to sell it as a bad thing, whereas I'm seeing the same numbers and think it's a great improvement over status quo ante.
> let's start by prefacing that 'production quality' C is 100% unsafe in Rust terms.
I don't know what one should even make from that statement.
> here's where we fundamentally disagree: you listed a couple dozen unsafe places in 1.5kLOC of code; let's be generous and say that's 10%
It's more than 10%, you didn't even bother to look at the code but still presented it, what in reality is a toy driver example, as something credible (?) to support your argument of me spreading FUD. Kinda silly.
Even if it was only that much (10%), the fact it is in the most crucial part of the code makes the argument around Rust safety moot. I am sure you heard of 90/10 rule.
The time will tell but I am not holding my breath. I think this is a bad thing for Linux kernel development.
> I don't know what one should even make from that statement.
it's just a fact. by definition of the Rust language unsafe Rust is approximately as safe as C (technically Rust is still safer than C in its unsafe blocks, but we can ignore that.)
> you didn't even bother to look at the code but still presented
of course I did, what I've seen were one-liner trait impls (the 'whole traits' from your own post) and sub-line expressions of unsafe access to bindings.
> technically Rust is still safer than C in its unsafe blocks
This is quite dubious in a practical sense, since Rust unsafe blocks must manually uphold the safety invariants that idiomatic Safe Rust relies on at all times, which includes, e.g. references pointing to valid and properly aligned data, as well as requirements on mutable references comparable to what the `restrict` qualifier (which is rarely used) involves in C. In practice, this is hard to do consistently, and may trigger unexpected UB.
Some of these safety invariants can be relaxed in simple ways (e.g. &Cell<T> being aliasable where &mut T isn't) but this isn't always idiomatic or free of boilerplate in Safe Rust.
It's great that the Google Android team has been tracking data to answer that question for years now and their conclusion is:
-------
The primary security concern regarding Rust generally centers on the approximately 4% of code written within unsafe{} blocks. This subset of Rust has fueled significant speculation, misconceptions, and even theories that unsafe Rust might be more buggy than C. Empirical evidence shows this to be quite wrong.
Our data indicates that even a more conservative assumption, that a line of unsafe Rust is as likely to have a bug as a line of C or C++, significantly overestimates the risk of unsafe Rust. We don’t know for sure why this is the case, but there are likely several contributing factors:
unsafe{} doesn't actually disable all or even most of Rust’s safety checks (a common misconception).
The practice of encapsulation enables local reasoning about safety invariants.
The additional scrutiny that unsafe{} blocks receive.
> The practice of encapsulation enables local reasoning about safety invariants.
> The additional scrutiny that unsafe{} blocks receive.
None of this supports an argument that "unsafe Rust is safer than C". It's just saying that with enough scrutiny on those unsafe blocks, the potential bugs will be found and addressed as part of development. That's a rather different claim.
It does, if you read the report and run a little (implied) math.
The report says that their historical data gives them an estimate of 1000 Memory Safety issues per Million Lines of Code for C/C++.
The same team currently has 5 Million lines of Rust code, of which 4% are unsafe (200 000). Assuming that unsafe Rust is on par with C/C++, this gives us an expected value of about 200 memory safety issues in the unsafe code. They have one. Either they have 199 hidden and undetected memory safety issues, or the conclusion is that even unsafe Rust is orders of magnitude better than C/C++ when it comes to memory safety.
I trust them to track these numbers diligently. This is a seasoned team building foundational low level software. We can safely assume that the Android team is better than the average C/C++ programmer (and likely also than the average Rust programmer), so the numbers should generalize fairly well.
Part of the benefits of Rust is indeed that it allows local reasoning about crucial parts of the code. This does allow for higher scrutiny which will find more bugs, but that's a result of the language design. unsafe {} was designed with that im mind - this is not a random emergent property.
They say "With roughly 5 million lines of Rust in the Android platform and one potential memory safety vulnerability found (and fixed pre-release), our estimated vulnerability density for Rust is 0.2 vuln per 1 million lines (MLOC).".
Do you honestly believe that there is 1 vulnerability per 5 MLoC?
1 memory safety vulnerability, that's a pretty important distinction.
Yes, I believe that at least the order of magnitude is correct because 4 800 000 of those lines are guaranteed to not have any by virtue of the compiler enforcing memory safety.
So it's 1 per 200 000, which is 1-2 orders of magnitude worse, but still pretty darn good. Given that not all unsafe code actually has potential for memory safety issues and that the compiler still will enforce a pretty wide set of rules, I consider this to be achievable.
This is clearly a competent team that's writing important and challenging low-level software. They published the numbers voluntarily and are staking their reputation on these reports. From personal observation of the Rust projects we work on, the results track with the trend.
There's no reason for me to disbelieve the numbers put forward in the report.
You accidentally put the finger on the key point, emphasis mine.
When you have a memory-unsafe language, the complexity of the whole codebase impact your ability to uphold memory-related invariants.
But unsafe block are, by definition, limited in scope and assuming you design your codebase properly, they shouldn't interact with other unsafe blocks in a different module. So the complexity related to one unsafe block is in fact contained to his own module, and doesn't spread outside. And that makes everything much more tractable since you never have to reason about the whole codebase, but only about a limited scope everytime.
No, this is just an example of confirmation bias. You're given a totally unrealistic figure of 1 vuln per 200K/5M LoC and now you're hypothesizing why that could be so. Google, for anyone unbiased, lost the credibility when they put this figure into the report. I wonder what was their incentive for doing so.
> But unsafe block are, by definition, limited in scope and assuming you design your codebase properly, they shouldn't interact with other unsafe blocks in a different module. So the complexity related to one unsafe block is in fact contained to his own module, and doesn't spread outside. And that makes everything much more tractable since you never have to reason about the whole codebase, but only about a limited scope everytime.
For anyone who has written low-level code with substantial complexity knows that this is just a wishful thinking. In such code, abstractions fall-apart and "So the complexity related to one unsafe block is in fact contained to his own module, and doesn't spread outside" is just wrong as I explained in my other comment here - UB taking place in unsafe section will transcend into the rest of the "safe" code - UB is not "caught" or put into the quarantine with some imaginative safety net at the boundary between the safe and unsafe sections.
Let's take a simple example to illustrate how unsafe {} cuts down the review effort for many operations. Take a static mutable global variable (a global counter for example). Reading a static is safe, mutating it (increasing the counter) is not - it requires an unsafe {} block.
If you need to check which places mutate this global static you only need to check the unsafe parts of the code - you know that no other part of your code could mutate it, the compiler won't let you. If you have a bug that is related to mutating this static, then it might manifest anywhere in your code. But you know for certain that the root cause must be in one of your unsafe blocks - even if you don't know which one.
Good programming practice will cut down that effort even more by dictating that unsafe access should be grouped in modules. For example when binding to a C module (unsafe) you'd usally generate an unsafe wrapper with bindgen and then write a safe wrapper on top of that. Any access that tries to go around the safe wrapper would be frowned upon and likely fail review.
And again, the compiler will help you there: Any access that tries to bypass the safe api would need to be unsafe {} again and automatically receive extra scrutiny in a review, making it less likely to slip through.
Compare that to a C codebase where anything goes. A static might be mutated anywhere in your codebase, even through a pointer to it - meaning you can't even reliably grep for it. It may slip through review unnoticed because no attention is drawn to it and cause bugs that are hard to trace and reason about.
If you're writing embedded code, similar considerations apply - access to registers etc. require unsafe {}. But because access is unsafe {}, it's usually gated behind a safe api that is the boundary of the low-level code and the higher buisness logic. Unsurprisingly, these are critical parts of the code - hence they receive extra scrutiny and in our project, we allocate substantial review capacity on those. And the compiler will enforce that no safe code can circumvent the access layer.
The number you're tagging as unrealistic figure is the result of dedicated and careful design of language and compiler features to achieve exactly this outcome. It's not a random fluke, very clever people did sit down and thought about how to achieve this.
> You're given a totally unrealistic figure of 1 vuln per 200K/5M LoC and now you're hypothesizing why that could be so.
You are the one claiming it's unrealistic. And you gave zero argument why besides “the codebase is complex”, which I refuted. See the definition of complexity:
> The term is generally used to characterize something with many parts where those parts interact with each other in multiple ways, culminating in a higher order of emergence greater than the sum of its parts
Each unsafe block may be “difficult” in itself, but the resulting system isn't “complex” because you don't have this compounding effect.
> I wonder what was their incentive for doing so.
And obviously it must be malice…
> For anyone who has written low-level code with substantial complexity knows that this is just a wishful thinking. In such code, abstractions fall-apart and "So the complexity related to one unsafe block is in fact contained to his own module, and doesn't spread outside" is just wrong as I explained in my other comment here - UB taking place in unsafe section will transcend into the rest of the "safe" code - UB is not "caught" or put into the quarantine with some imaginative safety net at the boundary between the safe and unsafe sections.
I think you don't understand the problem as well as you think you do. Of course if the UB happens then all bets are off! Its consequences won't be limited to a part of the code, by definition. And nobody said otherwise.
But for the UB to happen, there must be some violation of an memory invariant (the most common would be using a value after free, freeing twice, accessible the same memory from multiple threads without synchronization or, and this is specific to Rust, violating reference aliasing rules).
To avoid violating these invariants, the programmer must have a mental model of the ownership over all the system on which these invariants apply. For C or C++, it means having a mental model of all the code base, because the invariants related to one piece of code can be violated from everywhere.
In Rust this is different, you're not going to have raw pointers to one piece of data being used in multiple parts of the code (well, if you really want, nobody stops you, but I'm confident the Android team didn't). And as such, you'll have to think about the invariants only at the scale of one module. Building an accurate mental model of a 350-line module is much more tractable for a human than doing the same for an entire codebase, and it's not even close.
That's the other interesting observation you can draw from that report. The numbers contained in the first parts about review times, rollback rates, etc. are broken down by change size. And the gap widens for larger changes. This indicates that Rusts language features support reasoning about complex changesets.
It's not obviously clear to me which features are the relevant ones, but my general observation is that lifetimes, unsafe blocks, the borrow checker allow people to reason about code in smaller chunks. For example knowing that there's only one place where a variable may be mutated supports understanding that at the same time, no other code location may change it.
It actually does support it. Human attention is a finite resource. You can spend a little bit if attention in every line to scrutinize safety or you can spend a lot of time scrutinizing the places where you can't mechanically guarantee safety.
It's safer because it spends the human attention resource more wisely.
The practice of encapsulation enables local reasoning about safety invariants.
which is not fully correct. Undefined behavior in unsafe blocks can and will leak into the safe Rust code so there is nothing there about the "local reasoning" or "encapsulation" or "safety invariants".
This whole blog always read to me as too much like a marketing material disguised with some data so that it is not so obvious. IMHO
> which is not fully correct. Undefined behavior in unsafe blocks can and will leak into the safe Rust code so there is nothing there about the "local reasoning" or "encapsulation" or "safety invariants".
Strictly speaking, that encapsulation enables local reasoning about safety invariants does not necessarily imply that encapsulation guarantees local reasoning about safety invariants. It's always possible to write something unadvisable, and no language is capable of preventing that.
That being said, I think you might be missing the point to some extent. The idea behind the sentence is not to say that the consequences of a mistake will not be felt elsewhere. The idea is that when reasoning about whether you're upholding invariants and/or investivating something that went wrong, the amount of code you need to look at is bounded such that you can ignore everything outside those bounds; i.e., you can look at some set of code in complete isolation. In the most conservative/general case that boundary would be the module boundary, but it's not uncommon to be able to shrink those boundaries to the function body, or potentially even further.
This general concept here isn't really new. Rust just applied it in a relatively new context.
Yes, but my point is when things blow up how exactly do you know which unsafe block you should look into? From their statement it appears as if there's such a simple correlation between "here's your segfault" and "here's your unsafe block that caused it", and which I believe there isn't, and which is why I said there's no encapsulation, local reasoning etc.
> Yes, but my point is when things blow up how exactly do you know which unsafe block you should look into?
In the most general case, you don't. But again, I think that that rather misses the point the statement was trying to get at.
Perhaps a more useful framing for you would be that in the most general case the encapsulation and local reasoning here is between modules that use unsafe and everything else. In some (many? most?) cases you can further bound how much code you need to look at if/when something goes wrong since not all code in unsafe modules/functions/blocks depend on each other, but in any case the point is that you only need to inspect a subset of code when reasoning about safety invariants and/or debugging a crash.
> From their statement it appears as if there's such a simple correlation between "here's your segfault" and "here's your unsafe block that caused it",
> in the most general case the encapsulation and local reasoning here is between modules that use unsafe and everything else
This would be the same narrative as in, let's say, C++. Wrap the difficult and low-level memory juggling stuff into "modules", harden the API, return the references and/or smart-pointers, and then just deal with the rest of the code with ease, right? Theoretically possible but practically impossible.
First reason is that abstractions get really leaky, and they especially get really leaky in the code that demands the upmost performance. Anyone who implemented their own domain/workload specific hash-map or mutex or anything similarly foundational will understand this sentiment. Anyway, if we just have a look into the NVMe driver above, there're no "unsafe modules".
Second, and as I already argued, UB in the module library transcends into the rest of your code so I fail to understand how is it so that the dozens of unsafe sections make the reasoning or debugging any more simpler when reasoning is actually not a function of number of unsafe sections but it is the function of interactions between different parts of the code that end up touching the memory in the unsafe block in a way that it was not anticipated. This is almost always the case when dealing with undefined behavior.
> I don't get that sense from the statement at all.
It is a bit exaggerated example of mine but I do - their framing suggests ~exactly that and which is simply not true.
> This would be the same narrative as in, let's say, C++. Wrap the difficult and low-level memory juggling stuff into "modules", harden the API, return the references and/or smart-pointers, and then just deal with the rest of the code with ease, right? Theoretically possible but practically impossible.
The difference, of course, is in the amount of automated help/enforcement provided that makes it harder/impossible to misuse said API. Just like C++ provides new functionality compared to C that makes it hard-to-impossible to misuse APIs in certain ways (RAII, stronger type system, etc.), and how C does the same compared to assembly (providing structured control flow constructs, abstracting away low-level details like calling convention/register management, etc.), Rust provides new functionality that previous widespread languages didn't. It's those additional capabilities that make previously difficult things practical.
> so I fail to understand how is it so that the dozens of unsafe sections make the reasoning or debugging any more simpler when reasoning is actually not a function of number of unsafe sections but it is the function of interactions between different parts of the code that end up touching the memory in the unsafe block in a way that it was not anticipated.
...Because the way encapsulation works in practice is that only a subset of code can "touch[] the memory in the unsafe block in a way that it was not anticipated" in the first place? That's kind of the point of encapsulation!
(I say "in practice" because Rust doesn't and can't stop you from writing unsafe APIs, but that's going to be true of any language due to Rice's Theorem, the halting problem, etc.)
As a simple example, say you have a program with one singular unsafe block encapsulated in one single function which is intended to provide a safe API. If/when UB happens the effects can be felt anywhere, but you know the bug is within the encapsulation boundary - i.e., in the body of the function that wraps the unsafe block, even if the bug is not in the unsafe block itself (well, either that or a compiler bug but that's almost always not going to be the culprit). That certainly seems to me like it'd be easier to debug than having to reason about the entire codebase.
This continues to scale up to multiple functions which provide either a combined or independent API to internal unsafe functionality, whole modules, etc. Sure, debugging might be more difficult than the single-function case due to the additional possibilities, but the fact remains that for most (all?) codebases the amount of code responsible for/causing UB will reside behind said boundary and is going to be a proper subset of the all the code in the project.
And if you take this approach to the extreme, you end up with formal verification programs/theorem provers, which isolate all their "unsafe" code to a relatively small/contained trusted kernel.Even there, UB in the trusted kernel can affect all parts of compiled programs, but the point is that if/when something goes wrong you know the issue is going to be in the trusted kernel, even if you don't necessarily know precisely where.
> It is a bit exaggerated example of mine but I do - their framing suggests ~exactly that and which is simply not true.
I do agree that that the claim in that particular interpretation of that statement is wrong (and Rust has never offered such a correlation in the first place), but it's kind of hard to discuss beyond that if I don't interpret that sentence the same way you do :/
No, I am not saying keep the status quo. I am simply challenging the idea that kernel will enjoy benefits that is supposed to be provided by Rust.
Distribution of bugs across the whole codebase is not following the normal distribution but multimodal. Now, imagine where the highest concentration of bugs will be. And how many bugs there will be elsewhere. Easy to guess.
What am I exactly doing again? I am providing my reasoning, sorry if that itches you the wrong way. I guess you don't have to agree but let me express my view, ok? My view is not extremist or polarized as you see. I see the benefit of Rust but I say the benefit is not what Internet cargo-cult programming suggests. There's always a price to be paid, and in case of kernel development I think it outweighs the positive sides.
If I spend 90% of time debugging freaking difficult to debug issues, and Rust solves the other 10% for me, then I don't see it as a good bargain. I need to learn a completely new language, surround myself with a team which is also not hesitant to learn it, and all that under assumption that it won't make some other aspects of development worse. And for surely it will.
x86 has a general pattern of encoding operands, the ModR/M byte(s), which gives you either two register operands, or a register and a memory operand. Intel also did this trick that uses one of the register operand for extra opcode bits, at the cost of sacrificing one of the operands.
There are 8 escape opcodes, and all of them have a ModR/M byte trailing it. If you use two-address instructions, that gives you just 8 instructions you can implement... not enough to do anything useful! But if you're happy with one-address instructions, you get 64 instructions with a register operand and 64 instructions with a memory operand.
A stack itself is pretty easy to compile for, until you have to spill a register because there's too many live variables on the stack. Then the spill logic becomes a nightmare. My guess is that the designers were thinking along these lines--organizing the registers in the stack is an efficient way to use the encoding space, and a fairly natural way to write expressions--and didn't have the expertise or the communication to realize that the design came with some edge cases that were painfully sharp to deal with.
> - Forth people defined the IEEE754 standard on floating point, because they knew how to do that well in software.
IEEE 754 was principally developed by Kahan (in collaboration with his grad student, Coonen, and a visiting professor, Stone, whence the name KCS draft), none of whom were involved with Forth in any way that I am aware. And the history is pretty clear that the greatest influence on IEEE 754 before its release was Kahan's work with Intel developing the 8087.
> The first to be scientifically described, Fuchsia triphylla, was discovered on the Caribbean island of Hispaniola (Haiti and the Dominican Republic) about 1696–1697 by the French Minim friar and botanist, Charles Plumier, during his third expedition to the Greater Antilles. He named the new genus after German botanist Leonhart Fuchs
I vote to just change the spelling to what almost everyone already thinks it is anyways.
It'll still be just as weird. But "chs" is just nonsensical. The idea that it would sound like "sh" is baffling. I mean, I know this is English spelling which is not known for its regularity, but this is just too much.
The beginning of the English word "fuchsia" is not pronounced like the German word Fuchs, so indeed the spelling does not match the pronunciation. This is independent of the fact that it comes from that word. Plenty of things in English (and, in fact, loanwords in every language) sound different from the words they're derived from; that doesn't mean trying to imitate the source language is the "right" pronunciation. If you pronounce fuchsia like "fuksia" nobody will understand you.
:)
Yeah, probably in this case English is doing the right thing, pronunciation wise.
Anyway, checking in Google Translate the pronunciation it plays "fuksia", while Wikipedia has the right version.
> But "chs" is just nonsensical. The idea that it would sound like "sh" is baffling
In the word "french" C H is pronounced sh and nobody bats an eye, I don't think it's that outlandish that someone once read it as fuch-sia, incorrectly splitting it compared to the original.
In the language French, fuchsia is unequivocally read something more like few-shia, and I'd bet that even though it comes from German Fuchs-ia (fooks-ia) English has picked it up from the French side.
If you find such a loanword weird, don't you dare try reading Japanese.
But the question here is chs, not ch. Which though rare, is widely understood to be a kind of guttural sound or "k" sound followed by an s. In -uchs or -ichs coming from German.
Damn, I always thought Fuchsia is just a colour, but today I learned
- Fuchsia is a flower
- which is named after a German botanist (Leonhart Fuchs)
- Fuchsia in English is pronounced completely different than in German.
- Google is surprisingly bad at naming their products
But the way it's phrased and worded... at best, it's the kind of really bad typo that shows rank incompetence; at worst, it's outright fabrication that is actively lying about the credentials; and what I think most likely, it's obfuscation that's relying on credentialism to impart an imprimatur of credibility that is wholly undeserved (i.e. "I got an unrelated degree at Stanford, but it's Stanford and how could anyone who goes there be bad at CS?").
reply