A lot of C programmers make valid assumptions based on their system architecture and compiler. Is there any point trying to unify their obviously different practices, while at the same time ignore the Standard? No.
Many C programmers assume a single flat memory space. Most machines today have that. There's a long history of machines that didn't: Intel 286 in segmented mode, some Pentium variants in segmented mode beyond 4MB (Linux supports this in the kernel), IBM AS/400, Unisys Series B 36-bit word machines, Burroughs 5500/6700/Unisys Series A segmented machines where an address is a file-like path, Motorola CPUs where I/O space and memory space are distinct, and old DEC PDP-11 machines where code and data address spaces are separate. More recently, there are non shared memory multiprocessors where addresses are duplicated across processors - the PS3's Cell and many GPUs, for example.
Most programmers today will never encounter any of those.
There's also the assumption that null is address 0. Notably breaking the purity of C++s type system with magic 0s for 20 odd years before nullptr came along.
That is the point. They went through the process of tightening down the type system by invalidating automatic casts and making void* less promiscuous and then they break that whole philosophy with a magic literal "0" that can work as a normal integer or as an address placeholder for any pointer even though it won't necessarily evaluate to address zero. At that point why can't I assign "0" to a float too and enjoy some more magic there?
Well, one of the interesting takeaways from the article is that C as specified often does not let you take advantage of your system architecture. The specification has things like GPUs and segmented memory architectures in mind when it forbids you from doing seemingly reasonable things like taking the difference between the addresses of two separately-allocated objects, even though chances are very good what you're trying to do works just fine on all architectures you care about.
Yes and no. The standard specification defines a C that is very portable. It is quite reasonable that you cannot portably take advantage of your system architecture, which means you cannot do it using C as defined by the standard.
But you still can write stuff that looks just like C, that a C compiler will accept, and that lets you take advantage of your system architecture. You just need to understand that a new version of your compiler can break all of your nifty tricks.
No spi.D; is actually important.
No just because the program never writes to spi.S does not mean you can delete the while loop.
No you cannot reorder any of this and have it work.
Are you saying something other than "you, the programmer, must declare spi to be volatile, or else the compiler might lay a bear trap for you?".
Based on your comment it seems clear that you understand that... but given that you understand that I don't get the point you're trying to make (except perhaps that writing low-level hardware code in C often means actively stopping the compiler from screwing you over).
(for non-C programmers: "volatile" can be approximated by "hey compiler, this variable can change unexpectedly even if you didn't do anything to change it, so don't use any optimizations that assume you're the only one changing it).
I think the parent meant "implicit" rather than "abstract". The code is not abstract; but the pragmatic intent of the comment is not made very clear. I believe I follow but I'm not confident enough to risk confusing things by guessing.
(I believe I follow the intended conversational import; I certainly follow the code.)
Sort of the point is, in pure languages only the symbolic operations are important. And the side effects (memory operations) are unimportant and beyond the control of the programmer. So you can do all sorts of transformation on the code without effecting the end result.
C is definitely impure. There is a large set usage cases where the memory layouts are important. Where the orders of operation and memory accesses are important.
What I worry about is when the optimizer is allowed to make assumptions about undefined behavior, that may be actually important on certain targets. Consider referencing a null pointer.
Winows7, Linux, in user land if you do that and you'll get a seg fault. And if the default handler runs your program dies.
The ARM cortex processor I've been slopping code for, reading a null pointer returns the initial stack pointer. Writing generates a bus fault. Since I actually trap that, and error message gets written into non-initialize memory and then the processor gets forcibly reset.
On an AVR reading a null pointer gives you the programs entry point. Writing to that location typically does nothing. Though you can write to it if running out of a special 'boot sector' and you've done some magic incantations.
So that's the problem I have with the idea that 'undefined means the optimizer can make hash of your program' instead of trusting the back end of the compiler will do something target appropriate.
Yes, embedded is very side-effect-ful. When you have to make that kind of thing work, you usually have to know both your hardware architecture and your compiler.
*(unsigned long*)0xEF010014 = 0x1C;
presumes memory mapped I/O, a 32-bit hardware register at 0xEF010014, and what 0x1C will mean to that register. It also presumes that the compiler will generate a 32-bit access for an unsigned long.
Going back to Gibbon1's code sample, you have to know what your compiler is going to do at what optimization level. You either have to declare spi.S to be volatile, or you have to compile at -O0, or some such. And you may well need to review the generated assembly code a few times in order to really understand what your compiler is doing with your code.
If you're writing the Linux kernel and you want to be portable across hardware architectures, it gets harder. You can't just be processor- or hardware-specific. You probably need the volatile keyword. I don't know what the kernel compiles in terms of optimizations, but I bet it's not aggressively optimized.
I don't mean that sarcastically; pcwalton's sibling message only begins to mention the ways in which C is not a match to modern systems. Vector processing, NUMA, umpteen caching layers, CPU features galore... it's not really a match to the "architecture" any more.
(To the extent that you may think C supports those things, I don't really think "lets you drop arbitrary assembler in the middle of a function" constitutes "support". YMMV. To be fair to C there's a lot of features that seem to be unsupported by any high-level language today. Hardware moves way faster than programming languages. If you want to figure out what's coming after the current generation of languages, "a language that actually lets you use all the capabilities of modern hardware without dropping to assembler and giving up all the safety of the higher-level language" is at least one possible Next Big Thing.)
C matches NUMA and caching just as well as assembly does. Basically the only thing C doesn't really expose is SIMD, but compiling "high-level" (i.e. C-level or above) code into SIMD is an open research problem (AFAIK) - I mean, you can easily compile array programs into SIMD, but the more general case is still open.
What reduces the set of compilers is the reliance on compiler specific behaviours or recently approved standards, when compilers are still catching up.
"but compiling "high-level" (i.e. C-level or above) code into SIMD is an open research problem (AFAIK)"
In general, "taking a language never written for X and applying X on it" is an open problem. See also, for instance, trying to statically type a program in a dynamically-typed language. Of course it's hard to statically type a language that was designed to be dynamically typed. Of course it's hard to take C and make it do SIMD operations. You'd have to write a higher-level language designed to do SIMD from the beginning, if you really want it to be slick.
"C matches NUMA and caching just as well as assembly does."
I'm going to make a slightly different point, which is that C does not support caching as well as a language could. For one particular example, C still lays structs out as rows, and has no support for laying them out in columns. (It doesn't stop you, but you're going to be rewriting a lot of code if you change your mind later. Or doing some really funky things with macros, which is also rewriting lots of code, too.)
Of course C doesn't support this crazy optimization... when it was written the order-of-magnitude difference between CPUs and memory was much smaller, indeed outright nonexistent on some architectures of the time (though I can't promise C was on them). Computers changed.
(I'll give another idea that may or may not work: Creating a struct with built-in accessors designed to mediate between the in-RAM representation and the program interface, which the compiler will analyze and determine whether to inline into the memory struct or not. For instance, suppose you have two enumerations in your struct, one with 9 values and one with 28. There's 252 possible combinations, which could be packed into one byte, but naively, languages won't do that. This language could transparently decide whether to compute the values upon access, or unpack them into two bytes.)
Of course, virtually nothing does support this sort of thing, which is why I said C isn't really uniquely badly off... almost no current high-level languages are written for the machines we actually have today. (I'm hedging. I don't know of any that truly are, but maybe one could argue there's some scientific languages that are.) C is the water we all swim through, even if we hardly touch it directly, and ever since GPUs become standard-issue the mismatch between our programming languages and our hardware has become comically large.
Assembly supports everything, but by so doing so, supports nothing. It will of course permit you to lay your structs out in rows or columns or anything in between, but it doesn't particularly support any of them, either.
Incidentally, unlike some of my age, I don't actually complain about this; it is what it is, there are reasons for it, and what we have in hand is still pretty darned powerful. But, again, I'd suggest that if you are a language designer and you're looking to write the Next Big Thing, you could do worse than figure out how to start integrating this stuff into a programming language that can cache smarter and use the GPU in some sensible manner and all these other things. It's one way you might actually be able to create a language that will not merely tie C, but straight-up beat it.
I don't know of any way to compile non-array-ish, non-explicitly-SIMD-ish code into efficient SIMD code. Thinking about it, this seems to be the general case with parallelism - you can take fairly naïvely-written code, maybe add some memoization and hash-table-dictionaries, and it turns into not-too-inefficient sequential code, but I don't know of a way to do an equivalent with parallel code (especially if you add the memoization) - STM promises, but (AFAIK) still doesn't deliver.
Of course, the best way to have an SoA representation is to use an ECS :-).
C's main claim to fame is being "an HLL" (in the classical sense) but still being able to do essentially everything Assembly does. Also, having relatively-simple semantics surely helps (C is the only imperative language to have completely formalized semantics that I know of).
@qznc:
I don't know of a way to do it natively in Rust (you must write accessors).
"I don't know of any way to compile non-array-ish, non-explicitly-SIMD-ish code into efficient SIMD code."
Which is why I'm proposing that somebody creating one might get some traction, after all...
"C's main claim to fame is being "an HLL" (in the classical sense) but still being able to do essentially everything Assembly does."
And I discussed at length the things that assembly can do today that C can not, without callouts to assembler. I mean, I know the party line, I've heard it for like twenty years now, and my entire point is that it's not true anymore. C is not a "high level assembler" for a 2015 machine. It's a high-level assembler for a PDP-11. Which is still useful enough, thanks to backwards compatibility, but it's high time for it to get out of the way and stop being "the high level assembler", just like it's high time for it to get out of the way and stop being "the systems language".
C certainly can do SIMD just as well as assembly (via intrinsics). It does not allow high-level code to be compiled to SIMD, but there is no known way to compile high-level general-purpose code to SIMD. Finding such a method is an Open Research Problem AFAIK (of course, solving it would be Very Welcome).
It's possible to do manually in all those languages, but I'm not sure it can be done without programmer intervention: C++ and Rust both allow interior pointers/references to point to fields, which inhibits automatic application of many of the craziest layout changes.
I had to add that naively, languages won't do that, because of course there's no trick to it otherwise. Even Javascript can do it, it just won't be any faster. It's a simple example to fit a simple paragraph of text. One could imagine other possible optimizations to harness the fact that it's faster to do work on what you have than to pull more stuff in from RAM, like perhaps a string type that transparently compresses itself with a fixed dictionary (or optional dictionary) or something depending on runtime performance heuristics.
Of course anything I say is possible in an existing language, with enough work, enough assembler, and enough compromises, but it's not what languages are based around.
I suspect the parent is more referring to people writing code that does e.g. foo(i++, i++); or e.g. relying on signed integer overflow behvior based on observing a toy program on their own machine - assuming the code will behave the same in all contexts, optimization levels or minor versions of the compiler