Hacker News new | past | comments | ask | show | jobs | submit login
Pointers Are More Abstract Than You Might Expect in C (stefansf.de)
269 points by ognyankulev on July 2, 2018 | hide | past | favorite | 257 comments



This scratches the surface of why I hope C slowly fades away as the default low-level language. C sounds simple when you look through K&R C. C lets you feel like you understand the stack, ALU and memory. A pointer is just an integer and I can manipulate it like an integer.

But the reality is filled with a staggering number of weird special cases that exist because memory doesn't work like a simple flat address space; or the compiler needs to optimise field layouts, loops, functions, allocations, register assignments and local variables; or your CPU doesn't use the lower 4 bits or upper 24 bits when addressing.

C has no way to shield common language constructs from these problems. So everything in the language is a little bit compromised. Program in C for long enough and you'll hit a lot of special special cases – usually in the worst way: runtime misbehavior. Type punning corruption, pointers with equal bit representations that are not equal, values that change mysteriously between lines of code, field offsets that are different at debug and release time.

When using fundamental language constructs, we shouldn't need to worry about these ugly special cases – but C is built around these ideas. The need to specify memory layouts, direct memory access and other low level access should be gated by barriers that ensure the language's representation and the machine representation don't tread on each other's toes.

Rust has a long way to go but is so much more robust at runtime. I think there's room for other languages to occupy a similar space but they're need to focus on no-std-lib no-runtime operation (not always the sexiest target).


The following is purely my opinion.

Rust might have gone too far the other way. Yes it strives to be a modern language, with a lot of functional programming features and "OOP" done right (aka no inheritance, simply method polymorphism). To me Rust is more a C++ replacement than a C replacement.

A replacement for C should try to be as simple as possible while fixing C weak typing mess with strong typing for instance, including when it comes to pointers (for instance like in Ada where you can't just take the address of anything you want, you need to declare it as aliased to begin with). In fact Ada tries hard to make pointers redundant which is a great thing. This and range types and bound check arrays( Ada has a lot lot of great ideas, and "C style" Ada is a treat to use).

So a C replacement should keep the procedural nature of C, with closures perhaps, but not go further in terms of paradigm. C aficionados often claim they love C because it's "simple"(it isn't) That's what they mean, they love C because you can't really do OO with it.

On the other hand, Rust macros are exemplary of what a good macro system should be.


> To me Rust is more a C++ replacement than a C replacement.

People make this comparison a lot, but it’s not fair. Rust is far simpler, in terms of number of features of the language, to that of C++. It’s fair to say that Rust is far more expressive than C, but this just represents an initial learning curve that is higher than C, but not as high as C++. The type system is fairly simple in Rust, but it can be used to create very complex things. It does take time to be able to read it, but the guarantees you get from it are very powerful.

> a C replacement should keep the procedural nature of C, with closures perhaps, but not go further in terms of paradigm

Why leave all of the language development of the last ~50 years on the table? Iterators in Rust are amazing, algebraic data types (enums) allow for simple state machine constructs, lifetimes guarantee memory safety, default immutable memory and single mutable owner mean no more concurrent memory modifications. It’s safer than Java, and as fast and memory constrained as C, what’s not to love?

These are amazing features, that allow for amazingly stable programs. To put it succinctly, you get to code once, and then rely on that code for a long time, to focus on building features, not bugs.


The only thing I ask of any C-replacement languages is that they make it possible to describe ABIs, message layouts, and on-disk formats in detail. C historically does that well enough, though not really all that well (e.g., bitfields and enums have issues, and struct packing is not always easy to get right). There is a lot of value to this!

The ability to mix object code from multiple languages (FFI) is critical, and that's why C is the common denominator for FFIs.


I won't do this justice, so I'll point you to the Rust book on this subject: https://doc.rust-lang.org/book/second-edition/ch19-01-unsafe...

In short, at the moment, if you want a stable ABI, you must export an un_mangled C interface. This is easy enough. One of my favorite sites is this for FFI stuff: http://jakegoulding.com/rust-ffi-omnibus/

It covers a bunch of languages, and suggests techniques for overcoming any complexity in exposing higher order types.


I wasn't criticizing Rust, and I'm aware of what you linked. The point I was making is that this sort of functionality is a sine qua non for C-replacement languages. That Rust has it is to its credit, and one reason that I think it is a very good C-replacement language.


C is the common denominator for FFI for OS that happen to be written in straight C, which then mix the C ABI with OS ABI.

That is not true in mainframes, Windows (ongoing replacement of Win32 with COM, it is lot of fun to do COM in C, more so UWP), Android (JNI, AIDL), Web (JavaScript/WebAssembly), iOS/macOS (Objective-C runtime), Fuchsia (IDL).


Re: COM, that works because MSFT can a) commit to an ABI for C++, b) they can stick to a subset of C++ that makes that easy enough. Similarly for OS X.

The point isn't to stay in a C-like world. The point is that it must be possible to specify an ABI, file formats, and so on, in the host language. Yes, we can use external IDLs (even user them as DSLs if you like). I know all about and have worked with quite a lot of them (e.g., ASN.1, XDR, and many many others). But when you get down to a certain level, a light-weight approach really pays off. For example, at the system call level, and for many sufficiently simple file formats and such. At a higher end in complexity, on the other hand, then ASN.1 and friends pay off instead, mostly in a) obtaining interoperability with implementations written in other languages, b) managing extensibility.

Another thing is that some of us think about how things work at many levels. It's nice to be able to think about an interface and accurately picture what goes on at a low layer. It's difficult to do that with HLLs. But HLLs can easily have features that allow fine-grained control of or exposure of low level details, and that's what I'm saying I want to have as an option.


COM and WinRT are still based on the C ABI though. So anything that can do that, can do them - hopefully with more layers on top for convenience sake, of course...


Not quite, COM yes, although it is masochistic to do so without any higher level help.

UWP on the other hand extends COM to support inheritance and value types across calls, and the lowest Microsoft was willing to go on their high performance components was the Windows Runtime C++ Template Library, now replaced by C++/WinRT bindings and updated MIDL compiler.


The ABI is still C. Value types are just structs, and inheritance is implemented via aggregation - under the hood, it's all just interfaces and conventions on how they all play together. So any language that can do C FFI, can do WinRT. We did most of it with Python and ctypes as a prototype.

And you could even use that from C back in the day if you #included the "Windows.*.h" headers that shipped with the SDK - they had a bunch of #ifdef __cplusplus etc in them. Not sure if that stuff is still there or not.


Why would you think that Rust is "far simpler" than C++? Exactly the same concepts (with extras on Rust's side because of the ML-like constructs) must be learned for both, except they're distributed differently on the learning curve.

And I wouldn't say that Rust's safer than Java either. Memory access errors are basically non-existant in Java and it has quite robust concurrency primitives and libraries.


That’s based on my experience of working with C++ on and off over the last 18 years and reattempting to learn some of the modern pieces of it, as compared to learning and working with Rust over the last 3.

I find Rust simpler, more concise, and nearly no footguns.

Edit, I didn’t see your Java comment:

> I wouldn't say that Rust's safer than Java either.

Here’s why it is safer: It has the same set of guarantees about memory safety as Java. In terms of concurrency, it is safer because of the single mutable reference guarantees across threads, enabled by a better type system. The type system is stronger, which allows for more details about code to be put into types, like algebraic data types of Rust, without the limitations of Java’s non-generic enums.

And the big one: no bare nulls allowed. This reduces a huge set of logic errors. Now Java is getting Value types in a future release which will allow for a true Option type in Java, ie one that cannot itself be null, the other points still stand.

On top of that, in large applications the GC can become an Achilles Heel, where it eventually pauses the application with dire consequences. Rust of course has no GC. And a great side effect of this is that Drop (java Finalizers) actually work in Rust, which is amazing.

Basically for these reasons, Rust is safer than Java.


Well, that's still personal opinion, even if grounded in experience.

My personal opinion is that they have similar complexity, but in general Rust decides to clearly define behaviour as much as possible and fail fast, preferably at compile time. This can be seen when comparing features: memory management is there (supported by the borrow checker), smart pointers are there, references & pointers are there, generics are there (even if not exactly the same), integer and floating point handling are there, concurrency is there (compile-time checking is nice), macros are there (albeit different and perhaps less error prone), string types are more complex than cpp, enums are more complex (but also more powerful), operator overloading is there, iterators are clearly more complex (and the functional aspects in general), etc.

So the average programmer will have to learn a similar amount of concepts or even more in Rust's case.

Where Rust is indeed simpler is error handling and building.

--- I figured you would bring up concurrency, and while Rust's compile-time verification is really nice, Java also has a simpler form in threading annotations. Additionally, I believe it's rare to use such low-level primitives in Java.

Nulls (and type system holes in general) are a correctness issue, not a safety issue, unless we want to argue that all bugs are safety issues.

GC pauses are a performance issue, as you already know.

Saying that Rust threading is safer than Java's is technically correct and Rust is probably a bit safer than Java because of that, but it's hard to argue that Rust should be chosen for a project because of that (this is the whole point of saying X is safer than Y). Java is super-simple to learn compared to Rust.


> that's still personal opinion, even if grounded in experience

I just wanted to make it clear how and why I've formed that opinion.

> unless we want to argue that all bugs are safety issues

Yes, I can definitely be accused of expanding the definition of safety to include all things that cause programs to crash in horrible ways. I look at it as differing levels of guarantees the language makes for you, and Rust makes many more at compile time than others. This adds to the learning curve you mention, is it worth it? Completely depends on if you see those guarantees as valuable.


> Why would you think that Rust is "far simpler" than C++?

C++ as a language is a lot more complicated. It has a lot of features with weird corner cases that interact in unfortunate ways.

> And I wouldn't say that Rust's safer than Java either.

Depends on what you mean with safe. If you mean memory safe, then they are about as safe. Rust has fewer runtime failures due to concurrency and null pointers though.


Not sure why the above comment is downvoted. The number of features in C++ isn't the root of the problem with C++. The real problem with C++ is that it both has a lot of features and that all of those features interact in subtle and surprising ways. Even if Rust had as many features as C++ (which it doesn't, not by a long shot; Rust is a medium-sized language like Python (though with a far less forgiving learning curve than Python)), one of the design philosophies of Rust is that features should interact in predictable ways, and if two features can't interact in predictable ways, then the compiler should do its damndest to prevent them from being used together and explain why.


Rust also has features which interact in subtle and surprising ways. Generics and operator traits bumping into the "coherence" rules is my personal pet peave. I doubt any user would have predicted that interaction, and even though the compiler gives a detailed pointer to why, it's cold comfort.

Honestly, after hating C++ for years, trying to use Rust made me appreciate more about C++. I wish someone would make a language which took the best parts of both.


> even though the compiler gives a detailed pointer to why, it's cold comfort

We'll just have to agree to disagree, because having the compiler watch my back and preemptively guard against unforeseen pitfalls is about the warmest comfort I could ask for. :P


Generics and operator traits ought to be orthogonal and composable. Would you still be comforted if the compiler didn't let you mix for loops and arithmetic in one function? I mean, even if it gave you a detailed error message...


The biggest issue is not how complicated it is, rather there is no good way to prevent people to write C in C++.


Ok then please prove that C++ is more complicated than Rust by listing which features of C++ are not available in Rust. It's nice that the compiler can tell me when I do something wrong, but I still have to know how to write it correctly in the first place. In Java it's more or less code and forget.

And yes, I meant memory safety.


Besides the (desirable) features already listed by someone else, here are a few others, some of which can have unfortunate interactions: implicit conversions, constructors, member initializer lists, inheritance, overloading, default arguments, capture lists, variadic templates.


The biggest ones people ask for are non-type templates, HKT, and sometimes move constructors.


You can still get NullPointerExceptions in Java, frequently.

Rust doesn't let you do that, at least not without conciously deciding to use unwrap() everywhere.


> To me Rust is more a C++ replacement than a C replacement

Precisely. Rust is what C++ would be if it didn't need to be backwards compatible (though I'm not a fan of some of the syntax choices in Rust). Unfortunately, there's a reason C++ needs to retain backwards compatibility and its for that reason that Rust isn't going to replace C++ anytime soon.

> That's what they mean, they love C because you can't really do OO with it.

Until they invent GObject or one of the other OO on C libraries. The anti-C++ sentiment is more because compiler support was really quite dire until fairly recently (C++ 11).


The good thing about no-runtime languages is that you can mix modules written in them, within reason.

So Rust can make inroads into existing C++ code bases where it makes sense faster than we think, without replacing C++ wholesale for a long time. Firefox itsef is likely such a codebase.


That's harder than you would think due to name mangling.

Basically they can only talk to each other through a C interface.


I've been programming C on and off for 30 years. Rust feels too complex to be a C replacement. C++, yes...

C's paradigm is really about working with memory. Anything that restricts memory access with bound or type checking is going to feel like "not C"...


C doesn't have anything to say about whether bounds are checked or not. That's why so many things, including pointers to arbitrary locations in memory, are undefined: the standard was built to accommodate both embedded system implementations where scribbling all over RAM is desirable, and implementations that do things like bound checking.


While I am not a fan of it, D's BetterC mode kind of fits what you are looking for.

https://dlang.org/spec/betterc.html

It's kind of a subset of D language, without dependency on GC or runtime. It feels like cleaned up C. Of course without runtime, most of the standard library doesn't work, so you are on your own for the most part, but then, C doesn't really have that much of a standard library to begin with.


> C aficionados often claim they love C because it's "simple"(it isn't) That's what they mean, they love C because you can't really do OO with it

I do not think C programmers are anti OO. Infact, a lot of C patterns are modeled on OO (struct + function). I think the appeal of C is that you are able to write the fastest implementation any given algorithm, something that just isn't possible in most other languages.


> I think the appeal of C is that you are able to write the fastest implementation any given algorithm, something that just isn't possible in most other languages.

Maybe you can, but lots of the C code I've seen in the last years was pretty inefficient compared to what one would have gotten from a reasonable C++ or Rust implementation:

Examples are inefficient strlen() operations due to the default "string" type, unnecessary copies and allocations of things like strings due to ambiguous ownership, unnecessary null checks to quiet down static analyzers in the absence of non-null references, and extra indirections or allocations in order to work-around missing compile-time generics (besides macros), etc.


> I think the appeal of C is that you are able to write the fastest implementation any given algorithm, something that just isn't possible in most other languages.

If you told this out loud during the 80's and early 90's everyone would just laugh.


True, and I meant this in the context where you could embed assembly into your functions if needed.


That's not a very satisfying answer, as it doesn't apply to the languages typically compared with C- C++, Rust, or even D, Nim, Zig, etc.


I don't think struct + function is any more OO than tuple + lambda is. That's a very shallow definition of OO.


Well, that's how C++ does everything…


You don't need to use any polymorphism (generics or traits) in Rust. You can use it as a pure structs and functions language like C (especially if you're not using the standard library).

The argument in favour of at least some polymorphism is that it helps to minimise highly repetitive code that would be avoided in C through aliasing, type punning or other pointer conversion.


I love Rust, but I don't think that using a subset of Rust is a real option at this point. You'd have to stop using other libraries and even the standard library for this - and while it's possible, it definetly won't make your life easier. Rust has a wonderful community, but it's mostly filled with enthusiasts who're actively using nightly, so if you try to do things in a more limited way, you'll be on your own very often - much more often than for example I, personally, would be comfortable.


While many people use nightly, the statistics show that most use stable, with some using nightly in addition. You just hear about nightly more.


This is true, but doesn't refute my point. If you want not only to write something in Rust, but also to participate in the community and get help from it, it's much easier to do it if you're using nightly.

I consider myself an experienced developer, but unlike other new technologies, which I almost always learn just from documentation, with Rust I often turn to community for help. I understand that this complexity is neccessary and I wouldn't Rust to be easier, of course. But still, with Rust's complexity and overwhelming amount of information that's already out of date with the best practices - it sometimes seems that over a half of google results on stack overflow are still about pre-1.0 Rust. And when we (people who learn Rust) turn to the community, it usually tells us (in the most helpful and nice way, of course) "just use nightly".


Yes, it is a bit unfortunate that many suggest nightly. It makes sense, given that they’re enthusiasts, and therefore know all about the latest and greatest. But, these days, it’s becoming rarer and rarer to actually need it. But that’s only generally; it’s true that embedded often still needs nightly, though that should change pretty soon!

I wish Google didn’t customize results so much; I rarely see anything pre-1.0. It really makes it hard to figure out how to best help :/


Is that still true for libraries? I think that's what he meant, not all rust users as a whole. My impression certainly agrees with him.


If a library uses nightly, then you have to use nightly to use it.

At this point, the only major library that requires nightly is Rocket, and Actix-web has been rapidly ascendant due to (rumordly) being used in production by Microsoft, and working on stable.

The data, from last year’s survey, (which means even more stuff was nightly only, for example, the RLS) where you can choose multiple options https://blog.rust-lang.org/images/2017-09-05-Rust-2017-Surve...


> You don't need to use any polymorphism (generics or traits) in Rust. You can use it as a pure structs and functions language like C

This just means that the Rust language does not have the feature of being simple. Of course, there are sane subsets of C++ also (e.g., do not use new/delete), but this does not mean that C++ is a sane language. The same goes for Rust, probably.


> The same goes for Rust, probably.

Well, at least you have the good grace to admit that you haven't used Rust. :P


I've used Rust. It's not simple. For instance, lifetime annotations are ugly and confusing.

What a snarky reply from someone who already knows that Rust is not simple. The person you replied to was right, and yet you cast doubt on what he guessed to be true.


The person that response was directed at was attempting to use a mistaken similarity to C++ to paint Rust as a language that is not "sane". You appear to be injecting your own interpretation as to what this subthread is about. :)


I assumed you were contradicting this quote:

> This just means that the Rust language does not have the feature of being simple.

I really don't know what it means for a language to be sane, so it didn't make any sense to me that you were replying to that part. My bad.


This is exactly what c++ tried to do and failed.

Of cause you can find a sane subset of c++ / rust that most comfortable with. But once your project accept c++ / rust, nothing can stop them using the other "features" of the language.

Trying to enforce code style drain lots of energy from the project. Maybe a good linter can help......


While I'm not a fan of polymorphism and actually avoid OO completely (except for making data structures out of classes of course) I feel like this could be something easily solved with tools, if making tools were easy.


That was the argument of C++ advocates in the 90s.


Considering that C doesn't have any good facilities for polymorphism, it's going to continue to be the argument of any language that succeeds C. Unless, that is, that successor language also doesn't have any good facilities for polymorphism, in which case we have Go as a fascinating real-world study in how people will incessantly demand them and deride the language for lacking them (which is saying something, considering that interface{} in Go is already better than pretty much anything one would hack up in C via void pointers or macros).


Apparently under certain conditions it is possible to emulate a crude form of ad hoc polymorphism in C.

https://codegolf.stackexchange.com/questions/2203/tips-for-g...


Wasn't it originally a much bigger deal to be able to have the facilities for more generic programming before templates? These days people's threshold for bloat seems much higher, while the penalty for allocation and jumping around in memory are much higher as well.


I presume that you are defining C++'s virtual functions as "not a good facility for polymorphism". Why?


OP said nothing about C++, just plain C.


Ah, I see. I agree.


At least C11 has light generics support.


It was just recently that C++ got some underpowered kind of generics (beyond templates). So, no, I don't think anybody advocated that in the 90's.

Late 90's advocacy was much more on the lines of "OOP lets you reuse code" and "widget libraries are great, and they all but require OOP". One of those is blatantly correct.


> A replacement for C should try to be as simple as possible while fixing C [...]

I have a feeling that such a language will offer too little to offset the costs of switching


I have always kind of wondered why Ada did not catch on as a systems programming language (I think it was designed to fill that niche as well).


1. At the time (i.e. pre-C++), it was a gigantic language. Further, because of the Ada policies (IIRC), you could not subset the language; you had to include the full multi-process model to call it "Ada", for example.

2. There were no good, cheaply available implementations of Ada. GNAT was originally quite awkward and finicky when I first poked at it, years after seeing Ada in a class (and I can't remember what compiler we used there).

3. Most importantly, no common, popular systems picked it up as the preferred systems programming language. MacOS had Pascal for a while, Windows went with C or C++.


Rational Software actually started their business selling Ada Machines, yep just like Lisp Machines but with Ada, but they pivoted into selling just compilers.


> OOP" done right (aka no inheritance, simply method polymorphism).

Why do you consider inheritance a bad feature of OOP?


On a very high level, in the vast majority of cases, you bring along lot of baggage, both in code and mental, that you don't need, and compared to some alternatives, such as composition, it is simply far less flexible. Inheritance fits a much smaller subset of problems than people think (one of them is GUI design, a problem inheritance fits with nicely). But like everything in this industry, it tries to get shoe-horned in to solve every problem.


It's surprising when I go into interviews and the people interviewing me do not have enough experience working with object oriented code (in the languages I know) to have come to realize this - it's one of those things it seems to have to be relearned over and over.


Think about it as the difference between only inheriting functionality (interfaces in Java 1.8+, Rust traits, etc) vs inheriting data.

OOP can become spaghetti very quickly. Especially as people mistakenly allow children to modify parent memory directly, protected non-final fields in Java. Once that happens and as the inheritance higherarchy gets deep, understanding what a field actually is at any point becomes very difficult.

Interface and trait inheritance at least removes one piece of confusion by only allowing functions to be inherited, which makes code much easier to reason about.


Inheritance says that a derived class is a subtype of its base class (this is formalized in the Liskov Substitution Principle). To me, this is strongly coupling two things that don't have a very good reason to be coupled: Code organization and expression substitutability.

Sometimes I want to derive a supertype. Sometimes I want to re-use behavior but I don't want the new type to be expression-compatible at all. But inheritance says that if I'm re-using code, I must accept certain expression substitution relationships between my code and the code I'm re-using.

As with anything, there are reasons to use it and situations where it makes sense. But it just seems like a very niche connection to use as the basis of a language design.


In my personal experience, overriding non-abstract methods is definetly the worst feature of OOP. Too often comlpex codebases have multi-level overrides, some call base method in the beginning, some call it in the end, and some none at all - and almost always this leads to incoherent contracts and behaviour.

Interface-only inheritance and composition instead of inheritance (with components or otherwise) is almost always simpler and thus more error-proof. Even if sometimes you have to repeat yourself a couple times, decrease in mental complexity is definetly worth it.


Say you decide to make a video game, and you decide to make a couple of classes for your game objects to inherit from. You make the classes Tool and Weapon. You have just prevented yourself from making the game Clue. The solutions to this tend to be ugly, and your object relations will tend to change over time, often in ways that are difficult to reconcile. Thus, GoF suggests to prefer composition instead of inheritance.


GoF?


The "Gang of Four", the four authors who wrote the classic "Design Patterns" book, (subtitled "elements of re-usable object-oriented software").


As Wildgoose says, it's a frequently-used initialism for the Design Patterns book, which is otherwise somewhat awkward to refer to concisely and unambiguously.


> "OOP" done right

I wouldn't call the Rust semantics "OOP". But it has a pretty complete implementation of type classes, with class inheritance.


Zig again.


I think it will only happen the way of Luddites in systems programming.

Even if redo Modula-2 with curly brackets I am not sure it would pick up.


> I think there's room for other languages to occupy a similar space but they're need to focus on no-std-lib no-runtime operation (not always the sexiest target).

You're describing the Zig language!

It aims to be as fast as C, and unlike most languages that say this is a goal, it means it. There's no "almost as fast as C if you ignore the garbage-collector and the array-bounds-checking", it's actually as fast as C.

Its author values the minimalism of C, but wants to create a far better language for doing that sort of work. [0]

They're doing surprisingly well. They even managed to make a faster-than-C (faster even than hand-tuned assembly) SHA-2 [1]

[0] https://andrewkelley.me/post/intro-to-zig.html , https://github.com/ziglang/zig/wiki/Why-Zig-When-There-is-Al...

[1] https://ziglang.org/download/0.2.0/release-notes.html , it's also mentioned somewhere in this talk https://youtu.be/Z4oYSByyRak


> faster even than hand-tuned assembly

I think you and I are working off different definitions of what that means, as my definition doesn't really allow for a faster implementation (unless there's some really spooky stuff going on in the compiler). I suspect you mean faster than a popular hand tuned implementation.


Obviously Zig's output (courtesy of its LLVM backend) can be expressed and distributed as assembly code, but it's not hand-tuned, it's compiler-generated.

Whether the hand-tuned assembly code they used was the fastest out there, I'm not sure. I'd hope so, or they're being misleading.

I might get time to re-watch the relevant part of the video - it's around the 21 minutes mark - https://youtu.be/Z4oYSByyRak?t=1292


My point is that all the fastest hand-tuned assembly has to do to match the compiled output is to do the same thing. The compiler has no ability to do anything a hand-tuned assembly cannot, by definition of how this all works.

So, saying it's faster than hand-tuned assembly doesn't really make sense, but saying it's faster than current optimal hand-tuned assembly could, but that also probably requires searching for a not-very-well tuned problem, as any optimization the compiler could apply is something that could be applied by hand in the hand-tuned assembly (even if it requires a lot of work to do so).

A general statement that might be made (and would also be very impressive) would be that it compiles to implementations that are on par with the current best hand-tuned assembly versions.

The small edge case where the original statement might be true is where some AI/ML/evolutionary techniques are applied and the assembly is faster in the generated output but we don't know why. That is, the speedups are not because of rules and heuristics, but derived by some complex process which we have little insight into, and thus can't manually replicate (which is what I meant by spooky stuff).


I mean, there's nothing a computer does playing chess that a human couldn't, and yet the world's best players simply can't beat the world's best chess-playing programs.


This assumes that the problem of a chess game and providing optimal solutions for an algorithm defined in a higher level language (or pseudocode) are similar enough that the same strengths apply. They may be, but I don't think that's something you can assume without some sort of evidence.

Though, I guess the same could be said of my prior statements. To my knowledge, compilers aren't generally looking through a search space for optimized solutions as many common game AI algorithms do (which would would be in line with my statements about AI/ML/etc above). I don't have the relevant citations or experience to state that with authority though.


Andrew explains in the video that the speedup is mostly due to compile-time speed ups and potentially the use of the rorx instruction. It cannot be faster than the fastest assembly because the by definition is the fastest assembly.


> compile-time speed ups and potentially the use of the rorx instruction

Indeed, but the 'compile-time speedups' are a legitimate point in favour of the compiler. If they didn't occur to the assembly programmer, or struck them as too complex to pull off, then the compiler deserves the point.

Have to say I don't follow why the hand-tuned assembly doesn't use the rorx instruction. It's not mentioned in the assembly file, at least, but I thought that was the point? https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly

Also, it's neat to see instruction-selection being so significant. Generally might expect cache/branch behaviour to be the kicker.

> It cannot be faster than the fastest assembly

Well, 'fastest assembly' is the domain of so-called 'superoptimisers', and has us pushing at the stubborn bounds of computability.

We were talking about hand-written assembly-code, compared to compiler-generated. Odds are that none of the binaries tested were the optimal assembly.

The only interesting question is whether the hand-tuned assembly code they tested, was the fastest available at the time. If not, the whole demonstration is a straw-man, of course.

Also I don't like that the winning Zig program runs for so much longer. A good benchmark should provide ironclad assurances that no candidate is getting an unfair advantage re. loading/'warm-up'.


I don't believe it is possible to hand tune every program to beat a compiler.


More a question of practicality, no?

Programmers are creative, compilers are well-tuned machines. Sometimes it's tough to beat a compiler's code, but I'm not sure that 'impossible' is meaningful.

Monkeys with typewriters could function as a highly parallel superoptimiser, after all.


When you realize the compiler's optimizations only account for about 10% of the total program's performance you find that the other 90% is entirely up to the programmer.

Architecture, data structures, batching operations, memory locality, and a bunch of other metrics are all concepts the compiler can't really help you with whatsoever and they have a much larger impact on performance than the 10% the compiler is actually able to optimize.

The problem is that either programmers don't care, or they can't make the distinction between premature optimizations and architecture planning.


You're right to emphasise good data-structures and algorithms (also concurrency, parallelism, etc), but compiler optimisation is nothing to sneeze at. '10%' is laughably off-base.

From a quick google: compiler optimisation can accelerate CPU-bound code to over 5x the unoptimised performance. https://www.phoronix.com/scan.php?page=article&item=clang-gc...


They seem to be benchmarking very specific things and not actual applications. These numbers do not hold in the real world.


In a sense you're right, but hand-tuning assembly is kind of an orthogonal problem to determining whether you're using the right algorithms and data structures.


Where do you get that 90/10 split from? Just curious.


Talks from Mike Acton and Scott Meyers, specifically "Data-Driven Development" and "CPU Caches and why you should care" respectively.

I forgot exactly where I got that number, but it's been a pretty good metric so far.

In a nutshell; the compiler is great a micro-optimizations and absolutely terrible at macro-optimizations. The former will get you a few percent of perf boosts while the later usually results in orders of magnitudes of performance gains.

Its near impossible to apply macro-optimizations at the end of a project without massive refactors.


Cool, I'll check those out, thank you!


I believe the 10/90 number may be from this talk, though I don't have time to rewatch it to confirm: https://www.youtube.com/watch?v=rX0ItVEVjHc


But it should be possible to hand tune to match a compiler, right? And that would mean the compiler was not faster, which is the general claim I was calling into question.


How so? Compilers aren't magic, you can - by definition (Assuming the current state machine style (no AI or nuthin') of compilers ) - you can just manually compile it.

Compilers are clever, but humans can also be really clever: They know how to think - You can change the algorithm entirely.

For practical reasons however (Premature optimisation also), trust your compiler.


It should be noted that 0.2.0 is significantly different than master in regards to pointer syntax due to pointer reform [0]. There are a lot of other changes too, but that one's the most noticeable.

[0] https://github.com/ziglang/zig/issues/770


Does Zig have a memory management story other than C's? Anything like Rust's borrow checker?


C sort of has this mindset built around it that you're writing portable assembly, except for messy bits like stack frame layout or register allocation. But that doesn't really hold true anymore, and arguably hasn't for decades.

The first obvious issue is that C is specified by an abstract machine that doesn't really correspond to actual hardware. There's no concept of segmented memory, or multiple address spaces in C. Traps don't quite work the way you'd want in C [1]; there's no way to catch a trap that occurs in a specific region of code. Worse, the abstract machine leans on a definition of undefined behavior in such a way that it's painful for programmers but not useful enough to make automatic security permissible. (Unsigned overflow, which is going to be a more common cause of overflows that lead to security vulnerabilities, is well-defined).

However, the more real problem with C is that it is outright missing hardware features that you'd like access to. Things like rotate instructions or wide multiply are easy enough to pattern match, so their lack isn't so devastating. But flag bits--particularly add with carry or checked overflow operations--aren't figured into at all. SIMD vectors have no conception. Returning multiple values via registers. Traps [1]. Coroutines. Multiple address spaces (especially important for GPUs).

[1] Admittedly, the OS is also to blame here: POSIX signals are a really atrocious way to handle machine-generated faults.


> flag bits

That's sort of a processor specific implementation detail, IIRC you can access them via GNU extensions.

> SIMD vectors

Not having SIMD IMO is a feature as doing useful things will expose processor specific details, you can use SWAR to do some of SIMD.


Exposing processor-specific details is the desired feature, not a bug. The people who want to program C for portable assembly do so because they want to take advantage of things they know about the architecture: the same source code isn't really portable across different architectures without #ifdef anyways.


But if you don't care about portability, and want more hardware control, you can do inline assembly, or something like Intel's AVX intrinsics.


> Traps don't quite work the way you'd want in C [1]; there's no way to catch a trap that occurs in a specific region of code.

Ah, unless you're in NT land, which makes lexical scoping of traps very easy: https://github.com/tpn/tracer/blob/0224d94b8d17fe74c39cec285...

        //
        // Verify the guard page is working properly by wrapping an attempt to
        // write to it in a structured exception handler that will catch the
        // access violation trap.
        //
        // N.B. We only do this if we're not actively being debugged, as the
        //      traps get dispatched to the debugger engine first as part of
        //      the "first-pass" handling logic of the kernel.
        //

        if (!IsDebuggerPresent()) {

            CaughtException = FALSE;

            TRY_PROBE_MEMORY{

                *Unusable = 1;

            } CATCH_EXCEPTION_ACCESS_VIOLATION{

                CaughtException = TRUE;

            }

            ASSERT(CaughtException);
        }
The helper #defines: https://github.com/tpn/tracer/blob/0224d94b8d17fe74c39cec285..., e.g.

    #define TRY_TSX __try
    #define TRY_AVX __try
    #define TRY_AVX512 __try
    #define TRY_AVX_ALIGNED __try
    #define TRY_AVX_UNALIGNED __try
    
    #define TRY_SSE42 __try
    #define TRY_SSE42_ALIGNED __try
    #define TRY_SSE42_UNALIGNED __try
    
    #define TRY_PROBE_MEMORY __try
    #define TRY_MAPPED_MEMORY_OP __try
    
    #define CATCH_EXCEPTION_ILLEGAL_INSTRUCTION __except(     \
        GetExceptionCode() == EXCEPTION_ILLEGAL_INSTRUCTION ? \
            EXCEPTION_EXECUTE_HANDLER :                       \
            EXCEPTION_CONTINUE_SEARCH                         \
        )
    
    #define CATCH_EXCEPTION_ACCESS_VIOLATION __except(     \
        GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ? \
            EXCEPTION_EXECUTE_HANDLER :                    \
            EXCEPTION_CONTINUE_SEARCH                      \
        )
    
    #define CATCH_STATUS_IN_PAGE_ERROR __except(     \
        GetExceptionCode() == STATUS_IN_PAGE_ERROR ? \
            EXCEPTION_EXECUTE_HANDLER :              \
            EXCEPTION_CONTINUE_SEARCH                \
        )
    
    #define CATCH_STATUS_IN_PAGE_ERROR_OR_ACCESS_VIOLATION __except( \
        GetExceptionCode() == STATUS_IN_PAGE_ERROR ||                \
        GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ?           \
            EXCEPTION_EXECUTE_HANDLER :                              \
            EXCEPTION_CONTINUE_SEARCH                                \
        )

Also allows you to do fun things like this for testing if you can do an AVX512 op (although this is not the supported way of doing things):

    #pragma optimize("", off)
    static
    NOINLINE
    VOID
    CanWeUseAvx512(PBOOLEAN UseAvx512Pointer)
    {
        BOOLEAN UseAvx512 = TRUE;
        TRY_AVX512 {
            ZMMWORD Test1 = _mm512_set1_epi64(1);
            ZMMWORD Test2 = _mm512_add_epi64(Test1, Test1);
            UNREFERENCED_PARAMETER(Test2);
        } CATCH_EXCEPTION_ILLEGAL_INSTRUCTION{
            UseAvx512 = FALSE;
        }
        *UseAvx512Pointer = UseAvx512;
    }
    #pragma optimize("", on)
https://github.com/tpn/tracer/blob/0224d94b8d17fe74c39cec285...

The structured exception handling protocol used by NT is really quite elegant.


> A pointer is just an integer and I can manipulate it like an integer.

Except it's not. Not unless you cast it to uintptr_t. Doy. :) Understanding the underlying asm can give you context for what a pointer is, but you can't assume that a pointer is an integer, any more than you can assume 'int' is a machine word in length.

There are reasons why C seems so abstruse and abstract. It has to exist on a huge variety of architectures and environments, from supercomputers to tiny embedded microcontrollers. Fun fact: The Lisp Machine had a C compiler! In Lisp Machine C, pointers were fat: they consisted of a machine pointer to an allocated chunk of memory and an offset within that chunk. And of course it was invalid to have a pointer outside an allocated object; the machine would throw an exception if you even tried. Hence why C pointers are only well-defined over valid references to objects and arrays, one past the end of an array, and NULL. And lest you say, "well no modern cpu works like that anymore", there are programs out there written for the Burroughs architecture that are still running. Are you going to cut off the maintainers of those programs because the architecture they're coding for doesn't resemble an x86?

This is part of why I disdain the call for modernity in programming. "Modern" means "limited". "Modern" means "I have no intention of testing this on anything but my laptop, so I will make assumptions that suit my use case and fuck you if yours differs".

C is a pain in the ass, but it becomes more approachable if you take into account its history. And it still fits in your head more easily than Rust.


Burroughs does not have a modern C compiler. As far as modern C standards go, the Burroughs architecture does not exist.

The same is true for the rest of the oddball stuff. It's dead now. Nobody is even trying. There is no longer any reason to make the C standard nasty in order to support these systems. The plug should have been pulled on them long ago, when they were well under 1% of the market and declining, but now they are nothing. No standards-conforming compiler supports that old stuff.

Even back at 1%, hurting everybody to support that stuff was not worthwhile. It was a terrible trade-off to make.


The obvious flipside of this is that those C programs written for Burroughs never had any chance of running on anything but Burroughs. The only difference is that C offloads the effort of portability onto the user, whereas other languages at least attempt to make it portable in the compiler.


Why not? As long as your program is standards-compliant, it should work on any computer with a working C compiler…


- integral sizes aren't standard

- little vs big endian

- signed int layout

- float layout

- includes vary

- bit field implementations vary

- loads of undefined and implementation-defined behavior


The only thing that can change is implementation-defined behavior, and this can be detected and guarded against. Undefined behavior cannot exist in a standards-compliant program.


What?! Since when did Burroughs had a C compiler?

NEWP was and is more than enough, while being much safer.


Safety is a hardware feature of the Burroughs architecture. C is no less safe there than NEWP.


Sure it is, NEWP only allows for C style unsafely on UNSAFE blocks, clearly tagged by the system administrator as allowed for execution.

So not only is lack of safety expressed on the type system, something that C doesn't have, it also allows someone explicitly stating it is ok to risk its execution.


You're assuming that C runs on the Burroughs the way it commonly does on x86, with arbitrary access to all memory within its address space. This is not specified in the C standard, and it's not the case for C on the Burroughs. In particular, since C was not the OS implementation language, but was mainly uaed to port programs from other architectures, the C heap lived entirely within a single memory block and C code could not interfere with other programs. See: https://en.wikipedia.org/wiki/Burroughs_large_systems_descri...


It still corrupts their own memory space, which is the whole point of C being unsuitable for any kind of safe software.

C also does not interfere with other programs on its own turf, UNIX.


> the compiler needs to optimise ...

wants to. There fixed that for you ;-)

The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.

Which is why the Linux kernel, among others, has to set special flags in order to get back to a somewhat sane variant.

(1) Except for those special architectures where the hardware is that weird, but in that case you typically know about it.


> The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.

Citation needed.

Signed overflow being undefined behavior was a direct consequence of different hardware representations of numbers. Inserting the instructions required to catch overflow and make it wrapping would be expensive, at least on some systems.

The fact that arrays decay to pointers, C has no dynamic array support, and so on are decisions that go way back and more or less prohibit array bounds checking. After preprocessing, inlining, and various forms of optimization it can be very far from obvious that some random NULL check after the pointer has already been dereferenced is intentional and shouldn't be removed.

Things like overlapping pointers have always had undefined behavior in the sense that inserting the instructions to check for the overlap all over the place would have a large performance impact, no one would use such a compiler, and thus there is no market for one. The standards committee largely documented existing practices rather than inventing new undefined behaviors.

I keep seeing people like you claim we should just "fix" undefined behavior or the standards committee gleefully inserted undefined behaviors because they hate programmers. If you actually sat down and disassembled the C code you write then went through the exercise of inserting instructions to eliminate most types of undefined behavior I suspect it would be a very illuminating experience. I also doubt you'd be so certain of your position.


Signed overflow was made undefined so that it could do whatever the CPU naturally did, not so that the compiler could delete huge chunks of supposedly dead code.

In the old days, it thus wasn't truly undefined. It was undefined by the language, but you could just look in the CPU documentation to see what would happen. There was some crazy behavior in the old days, but nothing you wouldn't expect from looking at the CPU documentation.

These days, nobody is shipping a C99 or newer compiler for any CPU with weird integers. Everything is twos complement, without padding or trap values. All of that "undefined" stuff should thus be entirely compatible across all modern C compilers.


IMO, signed overflow should be implementation defined behavior, rather than the anything goes of undefined behavior.


> I keep seeing people like you claim we should just "fix" undefined behavior

Nope. Just don't take undefined behavior to mean "do arbitrary optimizations that dramatically alter the behavior of programs".

Let's see what C89 says about undefined behavior:

"3.4.3 Undefined behavior --- behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Note the "permissible".

Newer versions of the standard:

"3.4.3 undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Note that the wording has change from "permissible" to "possible". To me the older version indicates that this is the range of things the compiler is allowed to do, and that range is from "do nothing" to "behaving during translation or execution in a documented manner characteristic of the environment".

I don't think "silently do arbitrary optimizations, removing arbitrary code or let demons fly out of your nose" is in the range between "nothing" and "documented behavior characteristic of the environment". And also not between that and "terminate execution".

However, the newer version says "possible", and this to me sounds a lot less like a restriction of what the compiler is allowed to do, and much more like an illustration of things the compiler could do.

And of course that is exactly what has been happening. And it is detrimental.

See also:

"What every compiler writer should know about programmers or “Optimization” based on undefined behaviour hurts performance"

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...

Oh, and I was using C compilers before there was a C standard (did get the updated when ANSI C was released). At that point all behavior was "undefined". And "unspecified". And yet compilers did not take the liberties they take today.


Your post basically boils down to "don't do any optimizations that might possibly exploit undefined behavior". That covers most optimizations. That means compilers can only use -O1. Proving you can still optimize is often equivalent to the halting problem (or you emit lots of branches and duplicate implementations for each function so the code can test the parameters and take the "safe" path in some cases or the "fast" path in others).

> silently do arbitrary optimizations

What are arbitrary optimizations to you? This isn't the magical land of unicorns here. Compilers have to be pedantic by nature so you have to sit down and come up with what is allowed and isn't allowed.

I happen to agree that some instances of undefined behavior should be defined and eliminated. For example, uninitialized values. Require the compiler to zero-initialize any storage location before use unless it can prove an unconditional write occurs to that location. This will have a small but manageable code size impact and a relatively small runtime perf impact.

> removing arbitrary code

The compiler is just a series of meaningless instructions executing on meaningless data on a dumb machine. It has absolutely no way to understand what is "arbitrary" code and what isn't. This is basically another argument for "compilers should disable 95% of all optimizations".


Firstly, "ignoring the situation completely with unpredictable results" is a pretty clear permission to do arbitrary optimizations, as long as they preserve the semantics in the case where the execution doesn't exhibit UB.

Secondly, in ISO standards Notes are by definition non-normative, hence your exegesis as it pertains to C99 and later versions is invalid anyway.


Your points are really good. It's a pity that -std=c89 doesn't act as time machine in this respect... would have been useful https://godbolt.org/g/fKfDFq


What changes are you thinking of? The only change with respect to undefined behavior I can think of off the top of my head is punning via unions, which C99 changed from undefined behavior to well-defined behavior.


According to

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...

new undefined behaviors were added, and for significant bits.

I think the bigger issue is that compiler writers have taken "undefined" to mean complete liberty to do whatever the hell they want, even fairly far away from the actual location of the undefined behavior.

And it turns out that the C89 standard had a range of "permissible" actions on undefined behavior. To me "permissible" is restrictive, as in you are not allowed to do anything outside that range.

Newer standards kept that section (3.4.3) the same, word for word, except for changing "permissible" to "possible". To me, "possible" is not restrictive but illustrative.

Now you might think this is reading tea-leaves on my part, and you might be correct, but on the other hand that change of interpretation is exactly what happened. Compilers used to not do these things, and nowadays compilers do do these things, and compiler writers cite the standard as giving them permission. Also consider to what extent wording of standards is finessed and fought over, and that exactly this one word was changed, in a section that otherwise remained the same word for word.


What does "softening" the semantics mean? Aren't they pretty unambiguously defined by the standard?


Newer versions of the standard.


But as you know, CPU ISAs are designed for C programs and compilers are optimized for C programs. So everything, even new languages like Rust, have to buy into C's model to some extent.

Truly getting out from under C's shadow is going to be very difficult. Maybe better languages are a first step on that path but they are only a small step.


That is said a lot, but what are actual desicions that are made to conform to the C model?

What would be good ideas in CPU design that aren't made since they are not compatible with C?

(I get it that CPUs have to support some common paradigms and use cases. For example, virtual memory / process isolation, maybe branch prediction things, or support for calling conventions. However, I don't think that is specific to C).


Support for unaligned memory accesses in hardware is one. The exceeding majority of variable accesses are through pointers of the correct type, and are therefore to variables that are at least naturally aligned.

But a very small handful of systems/procedures make unaligned accesses through type-punned pointers. Rather than just crash on these programs, processors will perform multiple memory accesses and stitch the results back together in hardware (worst case behavior: accesses that cross page boundaries). So there's extra hardware to perform this extra state tracking and detect the unaligned accesses in the first place.

Hardware doesn't just get turned off for conforming programs. Its there consuming energy and giving off heat on all of them.

Edit/addendum Here's another example: What should hardware do on integer overflow? There are several options, including signed-saturation, unsigned saturation, unsigned wraparound, and trap. The semantics of C are such that most hardware just picks 'unsigned wraparound' and makes programs eat the consequences. If you're very lucky then you also get 'an overflow bit, that gets clobbered on the next instruction'.

One way to think about this problem might be to ask the question, "What hardware features would reduce the runtime cost of instrumenting programs with the various sanitizers that are becoming available?"


Regarding alignment, I think it's not C that caused aligned reads to be the standard, but rather hardware itself. There isn't anything in C that prevents unaligned reads, is there? Aligned reads are just a consequence of storing data as vectors of a type, which is a very natural and efficient approach. Unless that's somehow a bad idea or prevents other good ideas, the point is moot.

Also caching and page boundaries have nothing to do with C. They are hardware issues (that good C programmers have to deal with, but anyway).

Contemporary Intel x86 haven't had significant problems with unaligned reads for a long time; do a web search for Lemire's post. (I guess there are still restrictions for atomic reads/writes).

I've long wanted to have access to overflow bits or traps on overflow. C certainly isn't in the way of adding them to CPUs, and signed overflow is even undefined in C, which is an opportunity to make them emit a signal or other exeptions. But I agree, C culture did not require hardware vendors to add overflow bits or traps, so maybe that is C's fault.


> Hardware doesn't just get turned off for conforming programs. Its there consuming energy and giving off heat on all of them.

Idle CMOS transistors don't have very much leakage current. They do consume some power but not much. Intel has gotten a lot better about clock gating ensuring more of the chip is truly idle.


Afaik, sparcs and many other traditional unix-running cpus does not support unaligned memory access


Technically, neither does ARM. Although you can enable emulation for unaligned accesses. One of the first things I did when setting up QNX was enabling emulation for unaligned accesses to eliminate a SIGBUS termination on some of our applications.


ARM supports it now, and with next to no perf cost.


Very speculative, don't take any of this as gospel:

The very concept of "the stack" implies a C-style evaluation model.

Having a single address space for both code and data mainly just gives an opportunity for optimisation to fail.

Everything-is-mutable gives a lot of opportunity for optimisation to fail. The compiler wants the program in SSA form, it register-allocates to a bunch of reused names, only for the CPU microcode to figure out which uses of those names can actually be used to mean separate physical registers.

Arguably the failure of VLIW (in particular Itanium) was because of C: those processors could (maybe) have run more constrained languages very fast, but it was difficult to compile C to run fast on them.


The stack is one of those things that would fall under "support for common paradigms". I checked that PDP-11 (the machine on which C was created) already had a hardware stack - and probably even older machines. There are many other programming languages that have the concept of a call stack. Hardware stacks are just an optimization to support this paradigm, and you are not required to use it. Conversely, C does not require any hardware support for implementation of the call stack.

The main problem with hardware call stacks is that they can't be relocated if you need predictable performance and/or have pointers instead of offsets into the stack. That means they are bounded size. With 64-bit addresses that may be less of a problem, but still. Functional language do not need to use the hardware stack, so that shouldn't be an issue.

> Having a single address space for both code and data mainly just gives an opportunity for optimisation to fail.

I don't think C requires that in any way, but since void pointers are allowed to point to either code or data, representation of void pointers must be somewhat "compatible" with both address spaces.

> Everything-is-mutable gives a lot of opportunity for optimisation to fail.

Not a hardware issue. Also not everything is mutable, not even in C. But you need some mutability for performance even at more abstract levels.


"The stack is one of those things that would fall under "support for common paradigms"."

Maybe for imperative languages where there is no support for lambda expressions (like C), but in Lisp or in functional languages call stacks are not part of the basic semantics. To put this in C terms, compilers for those languages will rewrite functions to take an extra argument called the "continuation", which is itself a pointer to a function, that is called when a function "returns." The result is that all function calls are tail calls, and by heap-allocating arguments and local variables (which is also necessary because it is possible to create and return a lexical closure), no call stack is needed (but you do wind up relying on a garbage collector). Other continuations are also used e.g. to support exceptions in Common Lisp.


Call stacks are almost always used in implementations of functional languages as well, even those that support first-class continuations. Even implementations that use CPS as you describe don't necessarily lose call-stack discipline (e.g. guile); that's a separate choice that some (e.g. SML/NJ) choose to make.

The call stack provides very cache-friendly and gc-friendly memory. It's still a very useful construct in functional languages.


C doesn't assume a single address space for code and data. Function pointers are quite different beasts than regular pointers (and this is where the "two bit-identical pointers can be unequal in C" bit comes in).


"That is said a lot, but what are actual desicions that are made to conform to the C model?"

Among other things, special instructions to support a function call stack, which is very much specific to the C model. In Lisp the call stack is unnecessary because of the use of continuation passing style, which implies that all function calls are tail calls and mostly requires that everything be heap-allocated (many compilers will use a call stack as an optimization to avoid excessive garbage collections; unlike C, it is not a fundamental part of the lower level semantics). If you wanted to design a CPU for Lisp or functional languages, you would probably choose to provide special instructions to support continuation passing style or normal order evaluation.

You could also imagine a CPU that does not support general pointer arithmetic or which has some form of built-in array bounds checks, which would all be fine if supporting C were not a requirement.


> unlike C, it is not a fundamental part of the lower level semantics

I don't think this is an accurate characterization of C, actually. Unless I'm forgetting something, an implementation of C that heap-allocated automatic storage and used continuation passing style could still be conforming.

Or in other words, the C standard doesn't specify anything about the memory layout of the stack- the word "stack" isn't even in there!


Sure, you can apply a CPS conversion to a C program, but what you would get would actually be a (abstract) function call stack because there is nothing in C that breaks the call stack paradigm (setjmp/longjmp come close but the semantics are carefully defined to avoid breaking the semantics of a call stack; in particular they cannot be used to "restore" stack frames that were previously destroyed by a return statement or a call to longjmp). In some languages (like Scheme) it is possible to write code that will not have that "de facto" call stack because there is first-class support for things that completely break the call stack paradigm (e.g. call/cc).


Okay, but most Scheme code doesn't use call/cc, and even when it does the vast majority of the continuations in use are also "abstract function call stacks."

You said it yourself- a stack is a useful tool for implementing these non-C-like languages. Its presence does not force you to "buy into the C model to some extent."


> You could also imagine a CPU that does not support general pointer arithmetic or which has some form of built-in array bounds checks, which would all be fine if supporting C were not a requirement.

I am happy to correct you, C does not require that. But also, array bounds checks are not a thing that you can implement physically. There is not necessarily corresponding only a single array length to any given address.

Imagine a virtual machine that just allocates a chunk of ram using malloc to provide virtual memory to the emulated machine. Now add another level of memory management on top.

Or imagime a function tha gets just a subrange of an array, for example to search an element therein. But the actual "array" might be much larger. It might be unclear even to the compiler how large the array is!


X86 does have array bounds checking now with MPX. Further everything you've mentioned is useful to all languages including the stack. Can you give an example of a negative trade off made for C?


In some languages a call stack is only useful as an optimization and some other form of control flow must be emitted for certain constructs that violate the stack paradigm (e.g. call/cc in Scheme, conditions/restarts in Common Lisp). C is also one of the only languages where general pointer arithmetic is necessary; in most languages used today more restricted forms of pointer arithmetic would be more than adequate.

I am not saying these are negative tradeoffs as far as the ISA goes, but it is absolutely the case that modern CPUs are designed with C (or some similar language) in mind.


Processors exist to run code quickly. Slow ones don't sell very well, so we'd see stack oriented opcodes regardless of C's existence. You would also still use functions like lea, and indexed moves in any language with the concept of an array so that too is not strictly related to C.

Most of the X86's CISC "style" predates C, and the reverse is mostly true - C was heavily inspired by the PDP's instruction set. When the 8086 came out originally BASIC was a far more important language for the class of processor it was. "Serious" software was much more likely to be raw assembler than C. The processor was just too small and compilers weren't good enough.


Right. That's what I actually wanted to ask with my first question (but didn't).


To get the max performance on modern CPU the code has to be cache-aware. It is painful to program for that in C. One often ends up with very complex macros or writing code generators that writes CPU-specific C code. So modern ISA are not designed for C.

By such arguments one can claim that CPU are designed for Fortran as it's compilers allows to write fast code with less efforts than in C/C++.


Actually NVidia's Volta was designed specifically for running C++ code.

http://cppcast.com/2017/09/olivier-giroux/

https://www.youtube.com/watch?v=Ud1savVLALA


I think it is fair to say that modern ISA are built around a C-speaking world if that's more reasonable.

This may also be especially true - on a technicality: Does the ISA actually specify the exact cache structure of a CPU? If not, then it is definitely reasonable to say that the ISA (if not the whole microarchitecture?) then is designed with C in mind.


Surely nobody designs ISA that performs badly for C programs. But it does not mean that it was designed to run average C programs optimally.

If one searches for "data oriented programming", then a typical link is about game programming and how using struct of arrays instead of array of structs improves performance due to better cache locality. But such struct of arrays are hard to use and maintain in a general purpose code. It works ok only in game engines or in numerical code when one have a lot of things with the same attribute and the total number of different attributes is low. So if anything, ISA is tailored to allow max performance when coding games or numerical recepeices in particular style. That style is not feasible to apply, for example, in kernel as C has very poor support for it and one gets too many different attributes to get real benefits from following such style.


Modern CPUs, but it used to be that they were designed for any high level language before C took over the show.


Some of your complaints could be said about assembly,too, but are you wanting to get rid of that, too? Some of your other complaints are about how processors and other hardware work, not C. In many ways, you are asking for C to be like other languages but then it wouldn't be C and all the advantages it does have.

If you get rid of C, you get rid of most of the software your operating system runs on and that begs the question you need to ask yourself: if C is so bad, why is everyone using it?


> if C is so bad, why is everyone using it?

Don't confuse success for correctness, survivorship bias is something to consider here. As others will surely point out, when C was written, there were other languages available at the time that were as fast, but much safer.

In many ways C won, because Unix won, to talk about them separately ignores how important both were to each others success. (Yes, other operating systems adopted C as well, but Unix is it's foundation)


Some mainframes like Unisys ClearPath don't have Assembly support at all, zero, nada, all necessary low level primitives are exposed as compiler intrisics.


I feel like C++11 and Rust have shown a way forward and the C standard should focus on a design that takes into account ownership and protected arrays in a simple way, hopefully as backwards compatible as possible.

Maybe the solution is for the major compilers to default to strong static analysis.

Even though there are safe languages, the enormous integration and compatibility of C means that it isn't going anywhere soon, so I think even imperfect solutions can offer a lot if they are low hanging fruit.


It is a matter of mindset, multiple safe C variants have been attempted throughout the years.

Even lint was already available in 1979, yet hardly anyone used static analysis tools on C codebases.


Which is why I mentioned defaulting to strong static analysis. Having it as a separate tool or an unknown flag is not enough.


I've been writing C on and off since 1990, across embedded systems, game consoles, massive SGI supercomputers, and modern kernels.

C is not fading away because of it's portability and low overhead. It has some nasty behaviors, as the original post shows, but you hit those once in a blue moon. The language is not unsound because of it, it just has some warts like any other language.

I don't particularly enjoy writing C, as it's too low level for my liking and I have to write too much code to do basic things (slist_* makes me shudder), but the reason that I have used it is due to its tremendous portability and direct compilation to machine code. That still matters in some places.

In places where memory is precious, or realtime performance precludes things like memory managers, C will continue to thrive, beside languages like Rust which are great for different goals. There's room for both.

C, in my opinion, has stood the test of time on its simplicity, portability, and the ubiquity of the C standard library, tiny as it may be.


On top of that C just got here thanks to widespread of UNIX and POSIX, and we had so much better options.

Yes they were languages with helmet, seat belts and airbag bolted on, but just like old technology cars on a motorway crash, the CVE database shows what happens when things go wrong.


These architectural bizarrities exist and have to be dealt with by someone. Of course it is better if that someone isn’t you—if some rust maintainer ports the language to the obscure hardware where the CPU ignores the lower four bits when addressing and presents nice, neat, and performance abstractions for you to use.

But who is going to pay for that effort? If the hardware manufacturer cared enough to to invest a lot in developer tooling it probably wouldn’t have picked such a hostile interface to begin with.


Is there a site or book out there that talks about these common pitfalls? I work in C89 so K&R C does seem pretty simple to me. I clearly haven't worked with it long enough to spot these things, or I have code in the wild that unknowingly is victim to these issues and I've yet to stumble on it.


> memory doesn't work like a simple flat address space

Can you expand on that? I thought that modern (non-segmented) memory does work like a simple flat address space, at least from the perspective of userspace programs. Isn't all the weirdness only a result of compiler optimisations?


Just to take one example: multisocket x86 is NUMA. There’s HW to paper over that fact, but the power/perf cost is quite high.


Also Epyc and (even more so) Threadripper have somewhat NUMA memory access in a single CPU.


Road to Rust is paved with good intentions. (Same is true about C++.)


Note that if the two pointers are passed to a function, and the comparison is done in the function, the results are different:

  #include <stdio.h>

  void pcmp(int *p, int *q)
  {
      printf("%p %p %d\n", (void *)p, (void *)q, p == q);
  }

  int main(void) {
    int a, b;
    int *p = &a;
    int *q = &b + 1;
    printf("%p %p %d\n", (void *)p, (void *)q, p == q);
    pcmp(p, q);
    return 0;
  }
That is giving me:

  0x7ffebac1483c 0x7ffebac1483c 0
  0x7ffebac1483c 0x7ffebac1483c 1
That's compiled with '-std=c11 -O1' as in the article. The result is the same of pcmp is moved into a separate file so that when compiling it the compiler has no knowledge of the origins of the two pointers.

I don't like this at all. It bugs me that I can get different results comparing two pointers depending on where I happen to do the comparison.


Yes, I agree. It makes far more sense to make guarantees in a language, that two values are compared using a metric that is invariant. Especially in a language like C, where we expect it to be a very thin abstraction over the machine.


You're right, unless pcmp is declared as static (then the compiler inline the function)

   #include <stdio.h>
   #include <stdint.h>

   static void pcmp(void *p, void *q)
   {
       printf("%p %p %d %d(sub) %d(cast)\n", p, q, p == q, !(p - q),
                                             (uintptr_t)p == (uintptr_t)q);
   }

   int main(void)
   {
       int a, b;
       void *p = &a;
       void *q = &b + 1;

       printf("%p %p %d %d(sub) %d(cast)\n", p, q, p == q, !(p - q),
                                             (uintptr_t)p == (uintptr_t)q);
       pcmp(p, q);
       return 0; 
   }
Results:

   $ gcc -std=gnu11 -O1 pq.c
   $ ./a.out
   0x7fffde6612ec 0x7fffde6612ec 0 1(sub) 1(cast)
   0x7fffde6612ec 0x7fffde6612ec 0 1(sub) 1(cast)
Same result with '-std=gnu11' as with '-std=c11' (GNU C allow void pointer arithmetic for the test !(p - q))


Yup, what is described in the article looks like a bugged optimization to me were the runtime behavior differs from the static model used for the optimization. I am sure the actual runtime semantics is in fact to compare addresses but the optimization claims to know (falsely) enough about the pointers to determine they must be not equal.

I don't understand the people in this comment section defending this. The cited spec gives no evidence that this should happen (I fail to see the undefined behavior here). It specifies when pointers are equal, and both pointers ended up pointing to the same object, so they should be equal. Just because the compiler doesn't know, shouldn't mean it can replace an equality check with false.

The spec didn't say "also, they are not equal in case the conpiler thinks that they proooobably aren't pointing to the same object".

I do agree that one should avoid this type of pointer arithmetic, but that doesn't excuse the confusing behavior of this primitive operstion.

One should at least argue for the behavior to be consistent.


Did you use gcc or clang or...?

When I ran this locally I get:

    0x7ffeee2a4808 0x7ffeee2a4810 0
    0x7ffeee2a4808 0x7ffeee2a4810 0
With -O0 I get 1/1 instead of 0/0 so this appears to be a compiler optimization.


I like the behavior of the compiler here. There is no guarantee that a and b are next to each other in memory. That's why the comparison fails, the alternative makes is runtime/compiler/optimization level dependent which would be a total mess.

As usual with those C bashing articles you won't run into trouble if you don't try very hard to write contrived code. I mean, the moment you see:

     int *q = &b + 1;
on your screen alarm bells should go off. Doing pointer arithmetic on something that is not an array is asking for trouble. If the standard should be amended in any way it should be undefined behavior right away you do pointer arithmetic on non-array objects.


> I like the behavior of the compiler here. There is no guarantee that a and b are next to each other in memory. That's why the comparison fails, the alternative makes is runtime/compiler/optimization level dependent which would be a total mess

Yes, there is no guarantee that they are next to each other, but in this case they happen to be next to each other, and according to the spec as quoted in the article, two pointers are equal if:

> [...] one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space

Note that it says "happens to" immediately follow, not "is guaranteed to" immediately follow.

This are pointers to ints, not arrays of ints, but that should not matter because as quoted in the article:

> For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

I don't see any way to read these as not requiring the pointers to compare as equal if the compiler happens to put a and b adjacent in memory in the right order and with no padding between them, other than resorting to something ridiculous like claiming that "follow" or "immediately" follow are not defined in the spec and so one object occupying that very next available address after another does not necessarily "immediately follow". This seems to be what the gcc developers went with to declare this is not a bug.

Also, note that if you move the "int a, b" to outside main, so a and b are on the heap instead of on the stack, then gcc does find that the pointers are equal.


> > For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

> I don't see any way to read these as not requiring the pointers to compare as equal if the compiler happens to put a and b adjacent in memory in the right order and with no padding between them, other than resorting to something ridiculous like claiming that "follow" or "immediately" follow are not defined in the spec and so one object occupying that very next available address after another does not necessarily "immediately follow". This seems to be what the gcc developers went with to declare this is not a bug.

"These operators" in the spec refer to the (additive) pointer arithmetic operators, not the equality operator. So incrementing a pointer to int behaves as if the pointee were in an array of length one, but that does not mean any subsequent equality operator should consider the objects as arrays.

It's totally fine for the equality comparison to return false when the objects in question are actually not arrays.


Then let's make them real arrays:

  #include <stdio.h>

  #define N 8

  int main(void) {
    int a[N], b[N];
    int *p = &a[0];
    int *q = &b[N];
    printf("%p %p %d\n", (void *)p, (void *)q, p == q);
    return 0;
  }
Results:

  $ gcc -std=c11 -O1 ar.c; ./a.out
  0x7ffd32d048c0 0x7ffd32d048c0 0
  $ gcc -std=c11 ar.c; ./a.out
  0x7ffe3f8ccd60 0x7ffe3f8ccd60 1


BTW, things work correctly when the objects live inside a real array:

  #include <stdio.h>

  int main(void) {
    int arr[16];
    int *p = &arr[0] + 1;
    int *q = &arr[1];
    printf("pointers: %p %p %d\n", p, q, p == q);
    return 0;
  }
The result is perfectly correct and predictable, as the Standard guarantees pointer comparisons in this case:

  pointers: 0x71b35ac790d4 0x71b35ac790d4 1
The original code in the article relies on UB, I think, so all bets are off.


> Note that it says "happens to" immediately follow, not "is guaranteed to" immediately follow.

I read that more as "happens to be placed" by the programmer (as a conscious decision), not including mere happenstance. In other words, this refers to arrays and struct fields.


> Also, note that if you move the "int a, b" to outside main, so a and b are on the heap instead of on the stack, then gcc does find that the pointers are equal.

This is also an utter accident, since it depends a great deal on the toolchain. I've worked on compilers and linkers that will put the variables in totally separate parts of the binary (e.g., based on name hash).


Agree on the alarm bells with the pointer arithmetic.

Disagree that gcc is doing the right thing here. The clang behavior (different comment in this thread) is much more sane: if the pointers happen to be the same, they compare as equal. If the pointers happen to not be the same, they compare as not equal.


Except clang's behaviour changes depending on the optimisation level. If you use -O1, then you get a different result.


Yes, because the pointers are different at -O.

As there are no guarantees as to the relative placement of auto/stack variables relative to each other, that is perfectly fine.


If the behaviour differs at optimisation levels, would that not suggest that it is not a sane behaviour?


Not in this case, no, not at all.

Finding a different layout for auto variable in the stack frame at different optimization levels is perfectly fine.


That would require making a distinction between array-pointers and non-array pointers. There, another can of worms. Probably not worth the additional complexity.


C is useful in part because it allows for the freedom unimaginable in other languages.


That freedom does exist in Ada, Modula-2 among others, the big difference being that one needs to be explicit about what is going on.


the comparison at the start is nonsense - there is no specification for the ordering or location of stack variables. by taking the address of these variables, you could see that they actually are the same value, and so intuitively you’d think they might be the same, but a different compiler might put them in different locations. or they may be elided entirely through optimisation. it’s far safer to fail the equality test in this case - this is what the model specifies.

this is not even the first example of this counter-initiative behaviour. imagine two floating point values with exactly the same bit-representation. it is possible, without any trickery for them to fail an equality check - i.e, they are both NaN.

this is what IEE754 demands of a compliant floating point implementation. and indeed, it’s a sane choice when you understand why it was made.

similarly, it’s perfectly reasonable for this example to fail.


Surely the point of writing an if-block testing the equality of the pointers is precisely because the code isn’t assuming that they are the same - but maybe it has something it wants to do if it finds itself running somewhere where they are.

Writing code which assumed those pointers were equivalent would be bad, but this code doesn’t do that. Instead, it looks like the compiler assumes that the pointers won’t be the same and refuses to let you entertain the possibility at all.


> Instead, it looks like the compiler assumes that the pointers won’t be the same and refuses to let you entertain the possibility at all.

and indeed it should - it’s sane to use the address of a stack variable for the purposes of writing something into it, but assuming that it forms some sort of array in memory is sure to cause pain if you rely on that assumption.


What's the point of even trying to compare these two pointers?


I agree, when I can see the addresses match it would do my head in if the compiler says false, its unexpected, its surprising. If i write bad code, fine I'm used to that and I can debug it, if there are these new 'gotchas' thats less transparent. I've not used c11 yet, and don't likes the sound of it!


I haven't tested it but I imagine casting to uint64_t will make it work as you want.


clang is a bit more sane:

   > cc pointereq.c 
   > ./a.out 
   0x7ffee83163b8 0x7ffee83163b8 1
   > cc -O pointereq.c 
   > ./a.out 
   0x7ffeeeb8b3b8 0x7ffeeeb8b3c0 0
So without optimization, the pointers are the same and compare as equal. With optimization, the pointers compare as not equal. At first that seemed horrible, until I saw the pointers actually are not the same. Since I don't recall any guarantees about stack layout, that seems perfectly fine.

   > cat pointereq.c 
   
   #include <stdio.h>
   
   int main(void) {
     int a, b;
     int *p = &a;
     int *q = &b + 1;
     printf("%p %p %d\n", (void *)p, (void *)q, p == q);
     return 0;
   }


There's nothing surprising in the first example. Comparing the addresses of stack variables is undefined behaviour.

The second one is more interesting:

    extern int _start[];
    extern int _end[];
    
    void foo(void) {
        for (int *i = _start; i != _end; ++i) { /* ... */ }
    }
GCC optimized "i != _end" into "true". The kernel guys fixed this by turning "_start" and "_end" into "extern int*". I always thought [] was just syntactic sugar over a regular pointer, but seems like I was wrong.


Actually, the headline article tells us that it optimized it to true.

That affected some of my code from years ago, too; and another fix is to keep _start and instead of having another array called _end have an external size_t giving the number of elements of start and loop while i!=_start+count .

In fairness to the compiler people there shouldn't really be anything surprising in the second example, either. There are x86 memory models (compact and large) where that loop never terminates in the general case, because the two arrays are not in the same data segment and no amount of incrementing a (far but not huge) pointer will get from one to the other. So it's not the case that this is a new pitfall. It was a pitfall decades ago.

People just thought that it didn't exist with modern tiny (a.k.a. flat) memory models, because the old explanation wasn't applicable. It does, but with a different explanation.


Yup, I've mistyped "true" as "false".

If you set "_start" and "_end" to addresses of different objects, then you're invoking undefined behaviour and all bets are off. But this should be perfectly valid IMO:

    extern char* _start;
    extern char* _end;

    void foo() {
        for (char* c = start; c != end; c++) ...
    }

    int main(int, char**) {
        char buf[128];
        _start = &buf[0];
        _end = &buf[128];
        foo();
    }
Right?


The thing there is indeed not the validity, but the introduction of extra memory references and variables.

The code with the arrays compiles to immediate value loads with load-time fixups to the code; and if this is linking to an assembly language module the necessary assembly language is just some labels against the data.

The code with the pointers compiles to loads from data memory, with (in the more usual paradigm where the start and end are bracketing some vector of initialized data in the program image) load-time fixups to the initial values of the variables; and the assembly language has to be different, some variables with these labels whose initial values are the addresses of some other labels against the actual data.

Notice that the Linux Kernel developer in 2016 did not immediately appreciate the need to modify the concomitant assembly language and linker script stuff to match the change in the C language code.

The approach that I mentioned keeps the immediate value for the start, and uses a load from initialized data memory for the count. Other approaches are possible, but they need more work in the build process. For example: By generating the C/C++ language declaration and definition from a script or suchlike, one could calculate and declare the number of elements in the array, making it possible to use sizeof.


> I always thought [] was just syntactic sugar over a regular pointer, but seems like I was wrong.

It depends on the context. When used in function arguments, yes, it's the same. When used during the declaration of a variable, it's entirely different.


The confusion in the example is the choice of names "start" and "end". If we rename them "some_array" and "some_totally_different_array" the undefined behavior is clear.


> I always thought [] was just syntactic sugar over a regular pointer, but seems like I was wrong.

For another example showing their differences, if a.c has:

    int n[] = { 1 };
And b.c has:

    extern int *n;
    printf("%d\n", *n);
Then running b.c will segfault, whereas if it said extern int n[], it would work as expected (hint: extern int n and printf("%d\n", n) would also work). Arrays degrade to pointers at the drop of a hat, but they're distinct from them.


> Then running b.c will segfault

why would it not ? On one side you declare an int array of size 1, on the other side you declare a pointer. Think about what's going on with the bytes :

    int n[] = { 1 };
will look exactly the same in memory than

    int n = 1;
which is of course not compatible with

    int* n = 1; 

Arrays by themselves have no indirection in C.

=> https://godbolt.org/g/xwT7uW


Right, that's what I was explaining. A lot of people get tripped up by this because arr is equivalent to &arr, int arr[] is equivalent to int *arr as a formal parameter, etc, so many people go around with the idea that arrays are just pointers in C, like in the comment I quoted; it's an easy thing to get confused by. I was giving an example to demonstrate how they differ.


> Comparing the addresses of stack variables is undefined behaviour

Can you elaborate? I don't think this is true.


They're from different objects, so it's not meaningful (according to the standard) to compare them. That's what makes it UB.


"Not meaningful" and "undefined behavior" have very different meanings.

I agree it's not meaningful, but the standard does define the result of comparing pointers to two unrelated objects with "==", so it's not UB.


Where? The post cites C11 section 6.5.8 paragraph 5, which seems to disagree with you.

Of course == comparing unrelated pointers is something you need to be able to do, but relational comparisons (including those based on pointer arithmetic) don't seem to be defined anywhere...


Right, comparisons with < and > are UB, but I'm talking about ==, which is well-defined. It's hard to tell which operator "tzahola" was originally talking about, but I took "comparing the address" in this context (especially given his example code) to mean "==", or at least, include it, which makes the statement "comparing the addresses of stack variables is undefined behavior" incorrect in this context.


You're right. There's no undefined behaviour for ==:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.


Basically you have no guarantee where the pointer q is pointing to. Some compiler/static code analyzer will yell at you with this code.


isn't it guaranteed that is points sizeof(int) bytes higher than the address of b?

Whether something useful is behind that address is another question


There's a guarantee relative to the address of b. But there's no guarantee about the relative addresses of a and b themselves, i.e. where they are placed in automatic storage. So even setting aside the idea of optimizing the test away at compile time, there's no guarantee that the comparison result will be a specific value. a could be above or below b, and they are not necessarily adjacent objects.


From the C11 spec quoted in the article

> If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

So although most implementations might produce an address that's relative to b, that's not actually guaranteed by the spec unless it's `&b + 1`.


... which it is in the program under discussion. Again, the address which has no guarantee relative to the address of b or the value of q is the address of a.


From reading the spec it seems that is undefined behaviour for anything other than `&b + 1`. That is one past the end of b is fine but not `&b + 2` etc.


I wonder why someone decided that accessing "one past the end" should be fine


Short answer: so you can walk a pointer down the elements of an array without having to have weird code to deal with the end.


It's explained in the addendum to the article.


thanks


Depending on your architecture and compiler it could be that everything is 128 bit aligned, so you would had "empty holes" in your memory layout. (For example some dsp's)


I don't think that address of anything even has to be a positive number, according to the C standard.


Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Can someone explain to me the rationale behind this? Why not just "two pointers compare equal if they point to the same address"?


The C standard was (and is) written so that it allowed for machines that didn't have simple pointers; a highly relevant example for the time was x86 machines in real or protected mode using segment registers. Even when such environments have accessible linear address spaces under the hood, comparing the absolute address of two pointers requires some extra work (you have to reassemble the absolute address from the segment + offset pair). In the grand tradition of allowing the most efficient implementation possible, the C standard says that implementations don't have to do this work; they can simply compare pointer values directly, provided that they insure that pointers to the same object work (eg use the same segment + offset pair, which they normally naturally will).

The standard was also written to allow for C interpreters, where you may not have an underlying linear memory model at all and all pointers are represented internally in some complex way (for example 'object ID + offset inside object'). Here you don't have any particular idea of 'an address' as a distinct thing and so you can't naturally compare two pointers to different objects in any meaningful way.


Because (a) pointers do not point to addresses, and (b) now you have to define the concept of addresses being equal in the language standard and haven't actually reached the goal. (-:


Indeed; pointers intentionally abstract away memory addresses and addressing, which, as the article states, can be much nastier business than the benign flat address space of i386 protected mode.


My guess is type based alias analysis. The classic example is casting an int to a float like so may crash your program due to undefined behavior:

float my_bad_function(void) { int x = 5; float* y = (float) &x; return y; }


When I compile the example, i get:

  0x7ffd0b57ebd0 0x7ffd0b57ebd8 0
OK, so gcc reordered a & b; I'll fix this by chaning the initialisation of p and q to:

  int *p = &a + 1;
  int *q = &b;
But when I now run the example, I get:

  gcc -o c c.c && ./c
  0x7ffe55eb0aa4 0x7ffe55eb0aa4 1

  gcc -O -o c c.c && ./c
  0x7ffcfeffd914 0x7ffcfeffd914 0
So, p==q only evaluates to 1 if optimisation is enabled.

  $ gcc --version
  gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0


It shows the exact options the author used if you click on GCC:

Version: 8.1.1 Options: -std=c11 -O1 Target: x86_64-redhat-linux


> If we step back from the standard and ask our self does it make sense to compare two pointers which are derived from two completely unrelated objects? The answer is probably always no.

The one big counterexample I can think of is the difference between memcpy and memmove. The latter is supposed to be able to do arithmetic on memory regions, to see if they overlap. Is this article saying that the standard C implementation of memmove is relying on unspecified behavior?


Memmove is frequently cited as an example of a standard library function that can't be implemented using only standard C. It's incorrect, though: the way to do memmove using standard C is to use malloc to allocate a temporary buffer.

But the whole question is not terribly relevant. When memmove is provided as part of the C implementation it can rely on non-portable platform behaviour just fine. There's no rule that you have to implement libc using only standard C facilities.


And how do you implement malloc() in standard C?


Another example of arbitrary pointer comparison I'm familiar with is for resolving lock ordering problems. If I have a big set of locks that I need to take to complete a particular operation, a simple way to prevent deadlocking is to acquire those locks in the order of their pointers.

E.g. if I have three locks I need to take,

  lock_t *a = (lock_t *)0x100;
  lock_t *b = (lock_t *)0x300;
  lock_t *c = (lock_t *)0x200;
then I'll take them in order a, c, b;


If order of locks is important maybe it's worth considering putting them in an array.


Or maybe keeping a global atomic counter, and reading a unique (to the process) value out of it every time you create a mutex?


> ...or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

I was not aware of this special case. What's the rationale? Is there even a way in standard C to guarantee that two array objects are laid out in memory like that, with no padding?


A two-dimensional array?


Yeah, good point.


Start with one large array and subdivide it into two segments.


Ummm, when I run this on gcc 7.3.0 on OS X I actually get:

0x7fff5dbd89fc 0x7fff5dbd89fc 1

Which kind of shoots the whole article in the foot ...


Did you compile it using '-std=c11 -O1' ?


No, but that just makes me assume that this is an optimization bug. Especially since -O4 actually coughs up different pointers:

  gcc-7 -std=c11 -O4 -o test test.c
  $ ./test
  0x7fff5b5f5a08 0x7fff5b5f5a10 0
The following code works as expected even with your flags and the pointers are clearly not related to one another:

  #include <stdio.h>
  #include <stdint.h>
  
  int main(void) {
      uint32_t *p = ((uint32_t *)0x7fffffffffff3eac)+1;
      uint32_t *q = (uint32_t *)0x7fffffffffff3eb0;
      printf("%p %p %d\n", (void *)p, (void *)q, p == q);
      return 0;
  }
So, I would either chalk this up to either 1) "compiler optimization bug" unless a gcc maintainer explicitly gave me a reason to believe to the contrary or 2) getting your underwear eaten by weasels because you optimized after invoking undefined behavior (attempting to probe stack layouts qualifies as undefined behavior in a big way).

My assumption would be that the compiler had to do something screwball that it got wrong because you actually asked for a pointer to a stack object that it had optimized to always be in registers.


Not related to the content but I find the colored links in the nav a simple and very good idea.


Just a datapoint: VS2017 on Win10 x64 gives me 00AFF89C 00AFF894 0, which is what I naively expect.


Actually, you should not expect that, because you don't have a guarantee of ordering of the addresses of a and b on the stack.

Now try with /O2 . (-:


Maybe I'm misunderstanding you, but it's precisely because I have no knowledge of the stack frame's layout that I didn't expect the pointers to be equal.

With /O2 optimisation I get:

00B1F99C 00B1F9A4 0


That's where you are going wrong. You have no guarantee that the compiler places a and b next to each other, let alone in a defined order with a at a higher address than b such that one integer width beyond the end of b you find a.

Expecting one thing is wrong. You are expecting the one thing that the comparison always returns false. You should not expect anything. The comparison result could be true or false.

    H:\>cl /O2 pointereq.c
    Microsoft (R) C/C++ Optimizing Compiler Version 19.14.26431 for x86
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    pointereq.c
    Microsoft (R) Incremental Linker Version 14.14.26431.0
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    /out:pointereq.exe
    pointereq.obj
    
    H:\>.\pointereq
    00CFFA20 00CFFA20 1
    
    H:\>


You are confusing "did not expect them to be equal", which is what the OP wrote, with "expected them to be not equal", which is what you are railing against.


Ahem!

"which is what I naively expect."


My expectation was that they wouldn't be equal.


I already knew. (-: Tell mpweiher.


if you're adding 1 to a pointer, why would you expect it to be equal? I would not expect it to be equal.


He's adding "1" to the pointer that points to the variable "b" which in this case is allocated immediately below "a" in the stack. Thus adding one will make the two pointers equal.


The layout of things on the stack is completely implementation-defined.


It has been a while since I read the standard, but this seems well into the realm of undefined behavior, not implementation defined.


From my understanding, how the stack is laid out is implementation-defined; attempting to actually observe this ordering in your program is undefined behavior.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: