That we seriously discuss using 24 out of 64 pointer bits to prevent one of the many problems with buffer overflow, but we cannot seriously discuss making buffer overflows impossible is very depressing.
How about we use 24 bits of data pointers to keep the array size, or 1 bit to indicate "this is a pointer with a size" and 23 bits for the size, and then our load/store with index instructions, as well as freshly added pointer arithmetic instructions, trap when the index exceeds the size? Instead of using bits in instruction pointers to not let one of many kinds of buffer overflow create valid instruction pointers? No good?
How about we use 24 bits of data pointers to keep the array size, or 1 bit to indicate "this is a pointer with a size" and 23 bits for the size
That would imply a pretty big granularity for the sizes - if the maximum size is 4GB, the minimum is 512 bytes. Packing more efficiently might help (for instance, the way segmentation limits on x86 are) but introduces hardware complexity.
then our load/store with index instructions, as well as freshly added pointer arithmetic instructions, trap when the index exceeds the size?
You've described x86 segmentation pretty clearly here. It's been around since 1978, but most of the mechanism has been disabled in x86-64. Two of the registers involved are still used for things like per-CPU data and stack-smashing protection:
Decades of experience teach us (1) that the great mass of buggy C code is simply not going away, and the best we can hope for is that it dwindles into more niche situations (compare Fortran and COBOL: those codebases haven't gone away either); and (2) that mitigations at the hardware and system software level are of at least some use, if they can be implemented without breaking a significant chunk of that great mass of existing C code. My impression is that "make arrays track their sizes and trap on bad accesses" may be a valid-to-the-standard C implementation but it breaks too much real-world code (I don't have references to hand though, so it could be a wrong impression).
Easy answer: the majority of exploitable memory corruption vulnerabilities in 2017 aren't simple buffer size calculation mistakes.
Pointer auth and control flow integrity techniques cover most (all?) memory corruption flaws, including memory lifecycle errors (which are probably the most common modern source of vulnerabilities). Built-in buffer bounds checks do not.
One exception may be if you could steal an authenticated pointer to a buffer that's about to have some generated machine code written to it (e.g. for JIT execution), and use that to write your own arbitrary code instead.
That's a somewhat orthogonal issue. Your suggestion aims to prevent pointer access from clobbering data the pointer doesn't own. The pointer authentication protects the pointer that is being being clobbered, like a return address on a stack.
You don't need any special instruction support to do bound checked memory access. Write in Rust or Swift or whatever, and you're already making buffer overflows "impossible".
The buffer overflows are already out there, in billions of lines of C and C++ code, and since we can't rewrite all the code, we should mitigate it as best we can.
Sure, I just think my (not that well thought-out, but still) suggestion mitigates bugs in existing source code with fresh compiler support better than this thing does. A compiler/runtime using new instructions to make instruction addresses hard to clobber could instead use instructions keeping on-stack array sizes in the array pointer and maintaining it through pointer arithmetic. A compiler/runtime know when an array size is too large to fit into 23 bits, certainly on-stack arrays are never that big so your sister comment's problem about "4G" is not that big of a problem, just don't do this with large arrays.
It'd require somewhat more ISA & compiler changes but it'd solve more problems that just the one problem they solve, and I think the security of this would be easier to demonstrate, too.
I regularly allocate buffers in excess of 40GiB on my workstation. Linux on x86-64 currently uses 47 out of 64 physical address bits to support up to 128TiB of physical addresses. This leaves 17 bits in a pointer for your size field. (2 ^ 47) / (2 ^ 17) is 1GiB, so the granularity of your bounds checking system would be 1GiB unless you made the userspace ABI dependent on the number of physical address bits.
If you store the bounds separately (full runtime bounds checking) you lose efficiency on code which inherently can not overflow the bounds, and code where you have a large number of small objects (let's say you have 400GiB of 64-byte objects) with a known size. If you switch to a new language, great! But you obviously lose access to your existing code, which is a non-starter.
Even in a language where garbage collection or the type system theoretically prevents the existence of invalid pointers, they still can come into existence via the FFI. Even when both languages are safe in that respect, it's generally trivial to create a invalid pointer over the FFI.
How about we start the retirement of C as it is a liability more than an asset at this day and age
How about we only use languages that (as you propose) work with memory slices not naked pointers and where the concept of a null pointer does not exist
How about we only operate on memory slices after checking boundaries
What? Are you going to force them? People write C because it is convenient, productive, and popular enough to attract contributors and find tools. C has the best dynamic analysis and debugging tools of really any language.
You can think yourself superiour and turn up your nose, but the fact remains that many people have perfectly good reasons to write and maintain C. Your haughty commentary has no impact on that.
You may think that bounds checking is the answer to all situations, but if you're writing a realtime system, there's often no point in running the program if it can fail from an out of bounds read or write anyway. In a flight control system, or an ECU, there is often nothing productive about crashing. You need to verify your pointer logic, instead of hoping your program will crash.
> You need to verify your pointer logic, instead of hoping your program will crash.
I love how most of the comments are, in essence "you're too dumb to use C". Just check the pointers right? Too bad people who think they are too smart do get bitten by those issues
I guess that's why people won't use other technologies beyond C in embedded systems (they do use)
> How about we start the retirement of C as it is a liability more than an asset at this day and age
There's really only one replacement for C right now and it's Rust. If you don't like or can't use Rust (for whatever reason) you're gonna stick with C, and when you consider how much work has gone into making Rust a viable C alternative, it's clear we're a long way away from having a healthy ecosystem of C replacements.
C (and C++, who inherited most problems with memory insecurity from C) are still used for a very wide variety of programs. There are plenty of alternatives to C other than Rust. Which of them is appropriate depends on the task at hand. In no particular order, all the following languages can replace C with greater memory safety:
I'm not really a C apologist, but I am pretty irritated with the near-constant calls for C deprecation. It's a lot easier to say, "C sucks!" than it is to do something about it, and I think we should at least internalize how difficult replacing C will be before we go around castigating people for continuing to use it.
Oh absolutely! But those are merely drawbacks, not limitations. In contrast, there are many machines that simply can't host a JVM, or many platforms Rust just doesn't run on.
And that's just the "this is impossible" level. Sure you can build a database in Python, but it'll be slow and a memory hog, so if your requirements are "database, fast, low memory profile", then you can't use Python. Importantly, if you think those ever will be your requirements, you can't use Python.
I say this a lot but, engineering is about tradeoffs. There are still plenty of valid reasons to use C/C++. I'm tired of the knee-jerk "BOOOOO C" on HN these days, and while I certainly think we need to dispel the myth that you can write a meaningfully large, memory-safe program in C, I don't think we need to go as far as "you should never use C ever again". In fact, I think we need to be honest about the current state of the art in order to fully replace C -- which I wholeheartedly support.
> In contrast, there are many machines that simply can't host a JVM
Well, if you're thinking in term of available resources, not really : most (if not every) chip on payment cards or SIM cards run Java[1] despite being incredibly limited in term of resources.
There are other (niche) example of Java running directly on bare metal, see Jazelle[2] for instance.
> or many platforms Rust just doesn't run on
Right now, absolutely but there is no technical limitation whatsoever that prevents Rust from running on these platforms. It might come, in the next decade or so if Rust gets enough traction, only time can tell.
I totally agree with the rest of your comment though.
Yeah and that Java stuff is super cool; didn't Sun float a CPU with support for Java bytecode a long time ago? But they don't run a "full" JVM; I more or less mean "can run Apache Commons".
And yeah most of the platforms Rust doesn't support are either legacy or very niche. It's an interesting topic though; platform developers and manufacturers seem to have no problem shipping tweaked C compilers (usually some awful old GCC fork), but I've yet to see them use LLVM. I think Rust is hamstrung a little by having only the one compiler, but it's an entirely unfair expectation of such a young and ambitious project. Plus, it's hard to outdo LLVM.
We'll see how it goes. Maybe we'll see a lot less platform proliferation as mindshare moves away from C, but it's also possible that LLVM will just grow its platform support.
I also wonder if we'll see industry become a little more relaxed in its requirements. Like requiring multiple implementations, language standardization, or security/development standardization and verification. Most of this stuff grew out of C's instability, but with a more stable language maybe it doesn't matter? Or there are parallels in the web world too, like multiple browser vendors have to be on board with a feature for it to eventually become a standard, whereas with Rust it's pretty much whatever the Rust community decides and LLVM supports. Do we still care about standards and multiple implementations? Are the roadblocks worth it? I feel like on one hand I think they are, but also that if we accept them then we're kind of implicitly accepting C forever.
This is bloviatory though. NULL/0 is an incredibly useful value, it is the simplest to test for in hardware, which is why it is the basis of booleans in every major systems language. Because null is used for boolean evaluation, null is the perfect value for a pointer which doesn't point to anything.
Any time where you have an important distinction between the address of a valid object, and a non-address (next address at the end of a linked list, leaf node of a tree, failed initialization of a pointer). Zero is also used to terminate strings, for similar convenience/efficiency reasons.
C compilers and static analyzers together tend to catch possible null dereference bugs with near certainty these days, so people don't tend to ship them these days, if they make any effort at all. I have not encountered a null pointer dereference which wasn't typo-related in... I don't remember the last time it happened.
If you really want to be certain downstream users of your API won't struggle with it, put the null check in your sample code with a fat comment which says "This is NULL 0.001% of the time, and it really hurts when you don't handle that".
ALGOL, of course, is the sort of language which doesn't have pointer arithmetic. I would agree that a language without pointer arithmetic should not have NULL pointers, if only because it doesn't make any sense for there to be an abstract reference to an object which can not be used with functions designed for it.
In a language with integer pointers, like C, you check for null at allocation time. I've also seen people consider functions which could return NULL pointers to return something like an option type, where null is considered None, and everything else considered Some.
Null is a useful value, but you don't need it everywhere, that's why many modern languages have opt-in null-able types instead of that as a default behavior.
I would argue than in at least 80% of the cases you don't want your values to be null, and that's in those situations mistakes are made (because you don't expect the value to be null !)
No, I just think that in some cases it's worth it.
Some people should really be using bounds checks and option types more often, and I often use bounded functions for handling strings in fixed-size buffers. Some people write bugs into their programs for a lack of understanding or care given to these aspects of the language; but many people also make wonderful and unique things out of them.
I just don't think that the baby should be thrown out because somebody overfilled the bathwater. There is a time and a place for zero tests, null pointers, and pointer/index arithmetic.
How about we use 24 bits of data pointers to keep the array size, or 1 bit to indicate "this is a pointer with a size" and 23 bits for the size, and then our load/store with index instructions, as well as freshly added pointer arithmetic instructions, trap when the index exceeds the size? Instead of using bits in instruction pointers to not let one of many kinds of buffer overflow create valid instruction pointers? No good?