I worked on NSInvocation during Apple's 64 bit transition. NSInvocation is an FFI: it allows you to construct calls to ObjC methods given a runtime signature, which means implementing the calling conventions of four platforms (PPC and x86, 32 and 64 bit).
My approach was to write a script which generated functions for all possible signatures, up to N parameters. The functions verified the values they were passed both natively and through NSInvocation.
This flushed out a huge number of weird, undocumented ABI corner cases. For example on PPC, when returning a struct containing a bitfield, sometimes the bits are aligned at the top (MSB) of the register, other times at the bottom, depending on the width of the bitfield! This was just an old gcc bug, but it can't be fixed without breaking the ABI.
The resulting C code was so large it also unearthed a PPC64 compiler bug! PPC jump instructions have a 24-bit immediate offset. If the offset needs to be larger than 24 bits, it's supposed to load the offset into a register and emit an indirect jump. This just didn't work, but nobody noticed, since you'd need to branch past ~33MB of code in a single function to hit it.
> This was just an old gcc bug, but it can't be fixed without breaking the ABI
But that actually works? As in that the call generated by GCC correctly captures the structure returned by the function compiled by GCC, in spite of the misplaced bitfield?
Libffi did a stupid thing on PPC64 with returns. Quantities less than 64 bits are on the wrong end of the word, as if the register value were lifted from a big endian piece of memory. E.g a char value of 'A' gets returned as 0x4100_0000_0000_0000. You can't just cast that to a char and be done.
The problem the author has is not with the C programming language, no argument here is about some language construct or some abstraction within it, in fact very little about the actual language is said.
To me it is very clear that the issues described do not lie with C(++), the programming language, but with C(++), the language which powers everything from a 8-bit micro controller to the most modern super computer and almost every operating system you can try to put on top of them.
With any language as ubiquitous as C these issues about types had to arise at some point and the reason that most languages "feel" much more streamlined regarding types, is because they never pretend to care about some obscure niche architecture.
The same goes for "C as the language of ABIs", it is a very simplifying choice to have the ABI of your OS be described in the programming language of the kernel. Sure, it does not need to be, but if your language is the layer between software and hardware, maybe it has some right to dictate how you talk to it.
Any language which wants to have a comparable feature set to C in terms of breadth of platforms and depth of penetration will run into issues like this. Maybe it can, with the benefit of half a century of hindsight, make better choices.
A nitpick: please don't conflate C and C++. They're separate languages, this article is exclusively about C, and C++ is an instance of one of the languages that "speaks C". C++ has its own symbol mangling and ABI, which has not historically been nearly as stable or universal.
Virtually every language has a mode to make calls into the C ABI. That's exactly the point of the article.
To switch in to C ABI mode in C++ you have to wrap the declarations in 'extern "C" { ... }' blocks, otherwise you get the C++ ABI, not the C ABI. It's done similarly in many other languages.
I understand the frustration of principled PL authors trying to bootstrap an ecosystem for programs compiled from their language to run on. It can be frustrating having to interact with foreign memory that you don't control and specifications that intentionally leave much to be desired by allowing the hardware platform to decide the memory layout...
But if you didn't have these things it would be a lot harder to get even a basic system going.
It would be nice if there were _one_ standard way to encode an sized integer to rule them all... but we don't live in that ideal world. Pick an arbitrary, common layout, and stick with that. Your language might not let users target some micro-controller that runs on a toothbrush but you probably don't need to anyway.
Right, I dunno why I said layout. If you’re interop is with C libraries then the layout is kind of picked for you. And defined by the platform. Super great times.
> Any language which wants to have a comparable feature set to C in terms of breadth of platforms and depth of penetration will run into issues like this. Maybe it can, with the benefit of half a century of hindsight, make better choices.
I've been saying a variation of this for half a decade now. My version is: the only way to do everything that c/c++ does is to be just as terrible as they are. Replacing them therefore requires a small host of languages in order to do it.
Which is what we're seeing now: D, P, Rust, Zig, Odin, Carbon, Go, C#, Java. They're all slowly chipping away from what we used to need c/c++ to do. I think we probably need one or two more (and then more maturity for several of the existing options) before the process is complete and we're left with a bunch of languages in their own niches and c/c++ in a small niche of their own.
It's perfectly possible to create an ABI standard and have every symbol well defined, a completely specified grammar, and even keep it simple enough so people are able to implement it.
None of the problems on the article are necessary for C to run on microcontrollers and super computers. They all exist because C is an emergent semi-agreement created by people that mostly didn't talk between themselves and meant different things with their words on the few times they talked.
(Anyway, isn't C-- exactly an attempt to do that?)
>They all exist because C is an emergent semi-agreement created by people that mostly didn't talk between themselves and meant different things with their words on the few times they talked.
Sure. But that is pretty much inevitable if you try to get people from a few dozen industries with vastly different goals to try to agree to some form of standard.
You will have to make bad choices, because the good choice is totally unacceptable for one group on the standard committee (maybe for a very good reason).
The problems in C exist because of the vast space of problems it tries to solve. If your programming languages targets x86 and maybe ARM, there is a vast sea of problems you will never have to think about.
>It's perfectly possible to create an ABI standard and have every symbol well defined, a completely specified grammar, and even keep it simple enough so people are able to implement it.
Yes, it is possible but as standard which goes beyond a single system this is about as realistic as people only using "one programming language" and "one operating system".
1. Doesn't the D compiler contain an entire C compiler? Half the point of that section was to say that any C parser is halfway to being a fully functional compiler. That's arguably an exaggeration, but I fail to see how D serves as a counterexample.
2. How does the D compiler handle GCC- and Clang-specific extensions to C?
1. Yes, it does, but only the cparse.d file is for C. The rest of the "C" compiler hijacks the D semantic routines, code generator, and optimizer. There's also some custom work to turn #define's into simple declarations. If you just want to parse C, yes, you can just use cparse.d.
It's important to note that C cannot be fully parsed without keeping a symbol table, and cparse.d handles that, too.
2. C extensions will always be a problem. ImportC supports the most widely used ones. (Most C extensions are rarely used and can be ignored.)
But it didn't. In very few cases was a C compiler extended to compile C++. Instead, a C++ compiler just has to know how to compile C, and to link with C objects compiled separately.
I saw Zach Tellman's excellent talk at Papers We Love / Strange Loop this year[1]. In it, he talks about the Ship of Theseus [2] thought experiment as it applies to software.
His very insightful observation is that you can replace every plank in the ship one at a time and the resulting ship still looks the same. That's because each time you replace one plank, it must fit with the surrounding existing ones. So while the components are replaced, the interface between them is unchanging. If you believe the result of the thought experience is that it is the same boat, then it's because the set of interfaces define the boat.
This applies very much to C as the universal ABI. It's common to see programs written in a combination of languages who interface with each other using a C ABI even though none of those languages is itself C. And, for better worse, it's really hard to change this because while components are easily swapped out, moving an interface is hard because you have to simultaneously fix every component that uses that interface.
> This applies very much to C as the universal ABI. It's common to see programs written in a combination of languages who interface with each other using a C ABI even though none of those languages is itself C. And, for better worse, it's really hard to change this because while components are easily swapped out, moving an interface is hard because you have to simultaneously fix every component that uses that interface.
I'd go with the glass half full point of view. This has to do with fixing stuff that ain't broken. This has everything to do with developers not wanting to go the Java route and reinvent the wheel and have to reimplements all basic functionalities under the sun. They just target C in particular or in some cases POSIX and they are up and running in no time, reusing everything under the sun with minimal work.
The joke in the C community since the early 80's is that C isn't a high-level language, it's a medium-level language. Others quipped C is a portable assembly language: programmer beware! This was 40 years ago! Let's not pretend we've recently discovered all these problems with C. This has been known for decades. It's also been tolerated for decades because we had nothing more performant we could use and we needed to wring every ounce of performance we could get out of those feeble processors we had available at the time. Unfortunately because we were forced to use C it has an inertia that's kept it going all this time. But there's also a counterforce slowing down C's penetration. Now the last bastion of C is kernels, drivers, and embedded systems (used to be much, much more). Now we're seeing Rust starting to move into these spaces and actually we've been using non-C languages now in the embedded system space for the past 10 years. It's really just kernels and drivers remaining for C.
As far as using C as an ABI - I agree, that needs to evolve. I'm not opposed to using an interface specification language created specifically for ABI's. I think that's the way forward.
The complaint is that C isn't the only systems language anymore so defining the ABI in terms of executable C code is a pain-in-the-ass for those other languages.
C is also a terrible language to define an ABI because, by design, it's very imprecise in the way that it defines types.
> it's very imprecise in the way that it defines types.
It's perfectly precise, particularly if you choose to use <stdint.h> which has been available in the language for a very long time. You may have some issues with specific types and platform variability, but it's absurd to cast this as "it's imprecise with types."
It's _flexible_ with types.
People seem to miss that this is the reason why C is the lingua franca. It isn't trying to "perfect" computing. It just makes it possible. There's a lesson for new languages here.
C's variables always scale to use the underlying hardware in the most efficient way in terms of performance. There's always sizeof() and macros to get the size of the variables you have.
This is why C is mostly portable with minimum effort unless you do hardware specific things, or use the variable to their limits. In that case you can always define fixed size types.
I don't get why people love to bash C. No language has an obligation to do please the programmer in its default modus operandi. Programming languages exist to interface hardware with humans, and C operates in the realm of the hardware, and that's perfectly OK for me.
No they don’t. If we cared about that we’d be using ILP64 instead of keeping int at 32 bits for compatibility reasons. I’m not here to bash C, anyhow; I think you’re failing to consider that I have experience with C and what it’s not good at, and am able to look past trite statements like “C interfaces with the hardware” that aren’t true or useful. In this case the concern is that C doesn’t make it convenient to talk to hardware!
C ABIs (of which there are many because the C standard doesn’t cover ABI) are full of legacy cruft, because they need to be stable and backwards-compatible more than they need to be sensible or efficient.
The C ABI can even vary depending on compiler flags, e.g. availability of AVX affects calling conventions. It’s not easy to be interoperable with this, especially when ABI-affecting compiler flags and macros may be set by an arbitrary build system, not even the C source code.
I think if it was any worse, it (Linux, BSD, etc) wouldn’t function at all. So it is maximally worse for the subset of things that function. It’s like getting all the worst medical conditions that don’t actually kill you. Sure, there are worse diseases: you could be dead. Is that any comfort? Probably not. Probably be looking for a cure.
Android userspace uses Java and can only talk to native code via JNI, which definitely isn't a C ABI.
Likewise there isn't any writing to metal on ChromeOS for the official userspace APIs, running Linux (Crostini) implies running a sandboxed version on top of the actual ChromeOS, while Android on ChromeOS not only has the same JNI restrictions, it also is sandboxed without access to host OS.
EDIT: I also forgot that ChromeOS and Android expose many of their key APIs to native code via OS IPC, on top of the constraints mentioned above. With endpoints written in a mix of C++, Java and now Rust.
JNI is very much a C ABI, yes. I should know, for I've written JNI FFIs more than once. Yes, you can also write JNIs in C++, but you don't have to because JNI's ABI is C not C++, and that is because C++ ABIs were not stable decades ago when JNI was written.
JNI is an API between the JVM and C- or C++-coded plugins that can implement native methods. "Native method" means "written in C (or C++)". There's a) a standard interface that C-coded methods must present (depending on their Java signature) and b) a set of utility C functions provided by the JVM.
Because these plugins are loaded via `dlopen()`/`LoadLibrary_Ex()`/etc. the JNI API has an ABI.
If you don't believe me go look it up. There's a ton of resources on JNI. Here's an example taken from the wikipedia page on JNI:
I would even go as far as to say that a good OS ABI only uses a safe subset of C. This excludes pass-by-value structs (and vector types), struct returns, and maybe even bitfields.
Even with these restrictions in place, modern register-based calling conventions can be still be rather complex, but these restrictions reduce it somewhat. They help to avoid areas in which implementations traditionally diverge, too.
What's wrong with pass-by-value structs? Yes, I'm aware that long ago there was a problem on SPARCs where Sun Studio and GCC handled struct value returns differently, but not pass-by-value. ABIs can and do define how to do pass by value of values of struct types.
The bits of C that need to be avoided in APIs are:
- bitfields unless no bitfield crosses a
byte boundary
- struct fields of enum types (because the
size of the enum type is implementation
specific)
C isn't used for defining ABIs. ABI's have content in them like which registers must be saved by a called function and which ones carry arguments. None of that is C. However, out of necessity, ABI definitions refer to some C concepts and must include details like how structures are to be laid out by C compilers.
There isn’t the C ABI, as the article actually mentions. There are various ABIs, most of which (but not necessarily all) can be targeted by a combination of C declarations and compiler switches.
The one thing that could be improved is to come up with a better language than C headers to specify cross-language APIs/ABIs.
The fact that different ABIs exist cannot be helped. That’s something one has to cope with in any case, independently of C.
One way to sidestep many of the issues is to target the JVM. ;)
agreed, much of the bickering is on ABIs and FFI stuff which you could argue is hard due to ABI differences it supports. this is not a C problem, but a complexity C exposes. C does do that a lot, but thats also a strength. There are really no sound alternatives to C for low level system programming (OS core functionality, or embedded stuff). there's C++ ofcourse but regarding the complaints it suffers the same. Rust doesnt solve anything there, its impossible still to actually build a modern OS in rust for example. (maybe if most your core is wrapped in unsafe and you run on only a single core device...).
If C is such a problem id like to see some alternatives which offer what C does but less painfully.
C is only useful in certain domains, but within the domains where C is really good, theres no others. (unless you want to go assembly mode... good luck :) can be fun)..
anyhting that is less painful, exposes less complexity and is usually therefore less generally applicable.
A lot of the pains of C are lack of understanding how to use it because using it requires deep knowledge of it, and the target platform.
that being said, i do get userland applications and C dont really mix anymore. that is a fair point imho.
It could be worse, but for the long-term evolution of computing, we should be working towards convergent systems. There's enough subtlety and imprecision in the C spec that it's not well suited to convergence.
Among type issues, it also has the infamous "Macros" that are really just unanalyzable text replacements.
> I don't get the complaint, there are flaws, it could have been much MUCH worse.
I find it amusing that after half a century of existence and powering the world's IT infrastructure, suddenly some illuminated individual felt entitled to claim they alone found problems that they alone can fix.
In order for my new and improved Rectangle to talk to another really cool Rectangle, I have to resize one of my edges to fit nicely on the Square and the 2nd Rectangle must do the same. The Square is a stable interface that rarely changes.
I hate that the Square is a stable structure that doesn't change sizes dramatically when it's proven that a new size is better.
I think we should agree that `intmax_t` was a mistake and not focus on it so much in articles like this. It's enough that the C ABI is so hard to pin down, and that C headers are hard to parse.
What we really need to interop w/ C but w/o C is a C compiler based tool that outputs DWARF or DWARF-like debug output that actually lays out everything -- type sizes and layouts down to the individual bits that bitfields map to, enum values, constant-like macros, etc. Function-like macros you'll never really be able to use from other languages, so, oh well. Such debug metadata would have to be encoded in a stable/committed encoding, expressed in a stable schema.
With that sort of debug info a language like Rust could take care of all C interop natively w/o C, especially if those debug files were included with the OS so that Rust wouldn't need to invoke the tool that generates them. Though, obviously some `-D...` C compiler/pre-processor arguments can radically change the contents of the parsed headers, so some rationalization would be needed to cut down the number of "ABIs" to a manageable set (like GNU, BSD, POSIX).
But C2FFI emits plain JSON, so I don't see any reason why you couldn't build e.g. a Python auto-binding library on top of it. It depends on LLVM to generate the spec file, but end users don't need to have that.
> This tool helps automate testing that two languages/compilers agree on ABIs for the purposes of FFI. This is still in early development so lots of stuff is stubbed out.
...
> By running this natively on whatever platform you care about, this will tell you what FFI interfaces do and don't currently work. Ideally all you need to do is cargo run, but we're dealing with native toolchains so, expect toolchain bugs!
The idea of plain JSON as a format for expressing ABIs is great.
I wish it was in LLVM/GCC out-of-the-box, rather than a separate open source project (whose maintainer only seems to have rather limited time to spend on it). Ideally there would be a standard format so LLVM, GCC, MSVC, etc, could all produce it (obviously with some extension points since each has some unique features the other doesn’t). From memory, the actual JSON produced by c2ffi follows rather closely clang’s internal data structures and so might not be the right design for something intended to be generic.
Sure it doesn’t have to be JSON. It could be YAML, XML, ASN.1, S-expressions, whatever. But I think JSON is a clear winner (nowadays) on the basis of being simple, easy to parse (compared to C/C++/etc which have very complex syntax), parsing libraries being extremely widely available (almost every language, even the most obscure, has a JSON parser available; languages which lack JSON support in their standard library, such as C, are now the exceptional minority.)
Of course it is more voluminous and slower than a binary format. One could always define a binary encoding which straightforwardly maps to the JSON one (or just reuse an existing one such as UBJSON, BSON, CBOR, etc)
I don't really get what the author is driving at. Sure, each system has quirks so designing a portable FFI is difficult to say the least. But what's the alternative?
In my experience I have seen plenty of standards violated and downright disregarded because the platform has special functionalities or quirks. My view is that even if there was an ironclad ABI spec platforms would still violate it with their special implementation of the system interface.
I think we would end up with exactly the same problems only the rant would not be targetting C but something else.
It probably would be even worse. For instance it possible to write a leaf function in assembly for AMD64 that runs both on linux and windows if all you do is return a value. This is because they both use RAX as the return register. There subtle difference between the two but that is mainly what registers are used. The biggest difference comes in the stack layout.
Now when it comes to pretty much any other OS for instance on x64 they pretty much use the System V calling convention.
The thing is if C was not the glue, I could imagine the ABI would be wildly different on each OS heck maybe even each version of the OS.
If we look at windows you can't safely call any system calls directly since a major update could change the system call number. However, windows provides wrappers around these system calls. Linux is an outlier in OSes that considers changing system calls like this as breaking user-space.
Now if we did not have C what would operating systems be using to wrap all these system calls? I suspect something tailored to that OS specifically and it would be quite the mess so I bet it would even be worse to handle. Since at least C keeps things similar. Whereas without it you would get OS specific schemes that could vary by a lot more! Making the code base to support multiple platforms even larger.
What is with all the C hate on hackernews these days. I would love to go back to coding in c. Just once you program in Java it is very hard to be productive in c again.
The article isn't really about C programming, it's about C's type system being used to define ABIs for foreign-language function calls. The size of e.g. an int can vary from platform to platform. Structure layouts are also platform-specific. Looking at a C library's header file by itself tells you nothing about what to actually put in the registers.
I'm unclear on whether it would have been possible to do better without an implausible amount of standardization early in computing history. A low-level language that works on CPUs with different endianness and word size and pointer size and number of registers is by necessity going to be somewhat vague about calling conventions.
I had the exact opposite experience, starting out mostly coding in high level languages moving to C as the language I use day-to-day I realized how incredibly limited I was by the abstractions I was saddled with.
This didn't make me hate C, but it did make me appreciate every other language.
The spartan nature of the language itself, and all of the inevitable difficulties around linking has really made nearly everything else seem so much more simple.
There's something to be said for making it at least a little painful to e.g. concat strings in a loop or swap an integer value in an array for 500KB of raw JSON.
I don't feel like I've ever been let down by C. In what way has C let you down?
I almost feel the opposite, C is the only language that has *never* let me down. It's the only one that's always delivered on it's promises, it's the only one that never fights me on what I want to do. It always feels like the sky is the limit. And it's so nice to know that you're the one who's wasting cycles and memory, not the language.
> it's the only one that never fights me on what I want to do
This is the #1 thing I've learned not to like about it.
The problem is simple. How can I trust that code written by other programmers is up to my standards? If the language obediently does anything they want, how am I supposed to judge its quality? Do I have to review every line of code in every piece of software running on every piece of hardware I interact with? Do I have to go around building a complete and accurate mental model of literally any program I plan to use?
I don't have time for that. Nobody does. It doesn't scale.
When other people write software, I want a vigilant, tireless critic to watch them write code and stop them from doing things that are unambiguously stupid. That's the only way to deal with the vast number of programmers with skill levels beneath mine, writing code that threatens to affect my life. If they can't do that with C, I want them to do it with something else.
I also love programming in C! I even went so far as to write a metaprogramming layer so I no longer have to use cxx templates for generics. It's such a delightfully simple language that really sparks the Joy of Programming in me
Absolutely right there with you, in terms of joy per minute spent writing code, C wins for me by a huge margin. Every time I write C I end up making at least one thing that puts a genuine smile on my face.
It's the new "I hate Emacs" / "I hate Tabs". Nerd sites used to be flooded with posts whining about one or the other, in between posts complaining about Microsoft.
I prefer C for microcontrollers. Things are much more predictable as far as how much memory, stack space, and roughly how many processor cycles you're using for anything.
Lot of these pain points came up when I was trying to make Pascal bindings for LibBF (https://bellard.org/libbf/). The only API docimentation it has is the header file itself, and the automatic C-to-Pascal translator choked on something without giving an error message. It's a slow grind to translate everything by hand. One of the things that makes you wish C had proper modules (C++ does, finally).
This is the kind of pain that made me back in the day move from Turbo Pascal to C++, and only use platform languages, as at least the platform owner has teams dedicated to managing bindings to all SDK languages.
Hmm, I know that C is not parseable, but couldn't one define a parseable subset that is still meaningful? That would then pass the burden to the developers that want to bind to a specific library and hopefully these developers have a clue on how that library builds, works, and what functions are needed. In the long run, if a popular language (say rust) strictly insists on that subset being used, I could imagine that upstream libraries would start shipping such headers, say openssl would not come with a rust interface but a "machine-readable" C interface.
This is the solution. It doesn't even need to be parseable subset of C -- it could be anything. It just needs to be able to describe an ABI. And from that, you generate C headers or bindings for other languages.
I don't think this idea is in any way controversial -- it would just be a lot of work to do across a large number of systems.
There's a lot in here, but I'll target my primary gripe: I don't think it is at ALL fair to blame ABI designation on C. Platform developers have a choice to use an existing annotation or make the world more complex. Guess what? #2.
Also, isn't it disingenous to take a cheap shot at C for not being "parse-able" when the OP's favorite language, Rust, has the same syntatic grey areas? I don't know enough about Rust, but from what I know it should have the same issues called out in the HAL paper OP cites.
> Rust, has the same syntatic grey areas? I don't know enough about Rust, but from what I know it should have the same issues called out in the HAL paper OP cites.
Which ones? Afaik in Rust it's always clear whether an identifier belongs to the value or the type (or macro) namespace, for example a variable declaration is `let foo =` or `let foo: Bar =` (possibly with a pattern on the left hand side), parameters & constants being similar. This alone rules out most of them. It also doesn't have if/else branches without braces, nor _Atomic. So yeah, seems odd to claim it's disingenuous.
The reason C is so dominate here is not that C is magic, its that its pretty much the only thing that does care about having a stable ABI. When people design C replacements, they rarely understand why C is so successful. ABI stability is extremely important. (This is one of the reasons Linux on the desktop has never taken off outside of OSS because the distros don't maintain a stable common ABI)
Any other language could do it, just no one does, because its not sexy or fun.
I think the issue with the footprint here is not wanting to reinvent wheels.
In theory, if you scrapped libc and up, you could limit your "speaking C" to the kernel of whatever OS/architecture you're using. Different OS/architecture combos would be more of a nightmare than others, but we cheat off libc for that , except we have to do things to support multiple libc implementations some times.
It doesn't solve interop with other languages, because they don't want to reinvent wheels either.
In theory, you could scrap it all and go hardware up. Have a nice OS with a well defined message passing interface that any language could interact with. You could even make a libc wrapper on top to address portability. In 20 years, it may have serious adoption.
The problem is that the situation is "good enough" that nobody wants to solve it. That applies to most technical debt, and really any other human concern.
That message passing comes with a lot of overhead while it might be more generic. Making the already bloated software we use these days even slower does not sound like a better alternative.
That would make interacting with OS slower, and probably even more important are things like .DLLs/.SOs. If the dynamic library or heck even a static library your linking against used message passing imagine all the cases where that would be a bad idea. All that overhead just so you can call a function not written in whatever language you happen to be using. ABI's like we have now are a much better for this reason. Also just because one uses message passing does not mean there are no compatibility issues.
I think the main issue here is if we reference the article linked. "You Can’t Actually Parse A C Header". To know the size of types and therefore figure out memory layout you need to be able to read a C header. With that information you can construct a lot the ABI for a C function. Aside from some OS specific things like stack layout requirements and hardware/OS specific things like what registers are used for certain parameters of a function ect...
Sorry, I don't disagree at all and the message passing was a little tongue in cheek. I was in CS in the 90s, when the dream of a message passing microkernel utopia was so hot.
I don't see a way around this complaint other than starting from scratch. If you "fixed" C, you'd have to rebuild all those things underneath you. And the reason we do FFI is to not reinvent some chunk of code.
I think you fundamentally run into issues with ABI compatibility regardless of language. You get a step better if you could derive the ABI from the code, but there's always interop issues cross-language. I don't think you can avoid wandering into having to implement the isms of the provider's language.
So much baggage just comes from strings. Then you have higher level constructs like making threading/concurrency, memory management, etc. jive. That's even an issue at the API level. I use a ton of libraries that just wrap a C library with a language-idiomatic interface.
What does “talking” C mean? It means getting descriptions of an interface’s types and functions in the form of a C header and somehow:
matching the layouts of those types
doing some stuff with linkers to resolve the function’s symbols as pointers
calling those functions with the appropriate ABI (like putting args in the right registers)
Well we’ve got a few problems here:
You can’t actually write a C parser.
C doesn’t actually have an ABI. Or even defined type layouts.
That we have this situation today still surprises me. I mean I know how we ended up here, but it is so unfortunate and has a huge cost.
We've always had this situation, but when C was the only real systems programming language around we didn't notice it so much. FFIs were always painful, like JNI, and we accepted it.
This situation is much more noticeable now that we have Rust, Go, etc. all trying to be completely detached from the C run-time. And it's not just those. Even languages like Java and Python that do use the C run-time, they all want to have nice and easy FFIs nowadays, but it's still hard to do it safely and elegantly.
The place where your language needs to interoperate with the outside world is in OS calls.
The modern way to interact with an OS is to stick a request on a ring buffer and get back a request ID, and (eventually) check the response ring for that ID to show up. Like in io_uring.
If your language is powerful enough, its OS library can encode that directly (without first marshalling arguments onto a stack that they must then be copied from) for minimal overhead.
You might also have an FFI thing so you can call out to libcurl, libzstd and libsqlite. (Do any others matter?) Again, if your language is expressive enough, you can avoid building up an argument list the usual way that must then be reorganized for what the foreign library wants.
Hmm, did you assume that describing the calling convention and standard type 'names' should be in a separately provided plain text file like it is today?
It's portable assembler in the language and (mostly) semantics, but not system interfacing. Meaning you can just create functions, loops, assign variables, create expressions etc. in a common syntax that spans a wide variety of CPU architectures. You don't have to know the names of registers and what they are used for, whether you have a register-memory machine or or a load/store machine.
If you had to write really assembler programs across multiple machines, you'd appreciate how much above assembly C really is for remarkably low performance cost. I agree with you - I think that's a major part of the popularity - it was not the first systems language, nor high(er)-level language. It was the first that was widely portable, widely applicable (vs. FORTRAN), and equal or better performance than native assembly.
It's a perfectly fine programming language. Some of us like working with sharp tools (and sub 1 second compile times for 100kloc programs, and close enough to optimal performance, and actually knowing what's really going on).
I do sympathise about the API abstractions; some of them have definitely not weathered well. As for ABI's, I've been constantly amazed how well they've been preserved (kernel folks will understand, e.g. 32-bit userspace with 64-bit kernels just works).
What will the API/ABI's look like 40 years down the road? (will any modern language endure like C?).
If Rust fanatics spent as much time writing software in Rust as they do writing rants about how much they hate C and C++, they'd be a lot more productive.
I wonder if some of this C ABI hate is related to a design quirk of the LLVM tools: Is it true that every front end targeting LLVM needs to implement the C calling convention mostly on its own, rather than specifying a C function signature at the LLVM IR level (using a different syntax, obviously) and have the target backend translate that into the appropriate instruction sequence for the call?
Gosh, just checked out Ada briefly, looks _much_ simpler than rust, so, why not just going back to Ada and use it for memory safety? why even Zig is needed to be created? are they NIH syndrome or am I missing something.
The way out of this mess is services. You don't call the open function through some ABI, instead you send a message to a file system service through a mediator and get a file handle back. On Linux, that mediator would be D-Bus, developed by the GNOME people who relatively early recognized the need for it. Possibly because everything in GNOME is written in C so they were familiar with how painful it is to interact with C libraries.
Of course, services aren't panacea and they incur large performance penalties due to the context switching. However, for non-performance critical tasks services are the future.
Do you realize HOW slow this stuff is? :D :D I shudder when I hear "D-Bus". ) It's a nightmare. I remember being able to destroy Ubuntu just by holding the PrintScreen key for THREE SECONDS which overloaded the DBus to the point the whole system becomes unusable. ) I would definitely NOT want to depend on that.
I liked the COM/ActiveX-way, which was fast and somewhat flexible. But proprietary.
I really thought i would disagree with this take (i don't know the author), but in fact, this seems like the most honests anti-C arguments i have ever read. I'm not sure if i agree with the conclusions yet, i do not have the technical knowledge
(Correction: i wrote an uninteresting paragraph about some thought i had here while re-reading the article to avoid misquoting and misinterpreting: re-reading convinced me that the author conclusions are right, and my paragraph is now useless. If you have nice counterpoints to the article in the comments, i would love to be wrong).
Trying to replace C with something like Rust, is essentially saying that we should rewrite all kernels in Rust. C is the common language because it talks to the kernel, which talks to the hardware.
Forgetting how much would need to be replaced with objectively much slower code, Rust simply cannot do some things. You cannot always make absolute guarantees about memory safety for example when it comes to low-level programming. Sometimes to be fast, you have to make assumptions and take risks.
I am currently watching the Veloron game [1] as an example of how larger Rust projects may begin to look. I see something like this [2] and it doesn't look all dissimilar from C, just with a new syntax. Has writing this game eliminated all bugs? Nope [3]. Maybe there are less segfaults and bad memory management, but this was just one class of bugs.
Well, rust (and c++ as well) is that much better than c because it actually allows proper abstractions on a language level. You can’t even write a general purpose vector data structure in c, which doesn’t have an overhead.
None of these problems would affect these people if they didn't insist on making their pet programming languages run on what is essentially a C engine. Rust, like a lot of modern high-level languages, basically compiles to llvm ir and then absconds responsibility, leaving things to the fifty years of engineering involved in getting us to where our compiler backends are. The joke used to be "Java will run anywhere someone writes a JVM in C" but now it's "progamming language will run on whatever target someone wrote for LLVM."
Then, having used the C and C++ toolchains to boost themselves into a simulacrum of self-hosting, they immediately decide they don't want to actually implement their preferred abstractions in the operating systems they want to use, so they write huge FFI libraries to take advantage of the massive amounts of library code and OS interfaces that we've spent the past fifty years writing.
Finally, having arrived on the scene, decided that everyone else is wrong and they're right, using the bad-and-wrong toolchains to bootstrap their throbbingly correct and superior approach, we're treated to angry rants about how fifty years of engineering is messy and why wasn't some omnipotent project manager in the sky keeping all this in check while the world awaited the arrival of someone smart enough to write us, at last, a good and correct programming language?
I totally understand that looking into the spaces where our programming languages and our kernels intersect can lead to anger, dismay, and confusion. They're all living projects who have had to adapt to decades of massive change in the computing world; once the LispM went away all hope was lost for coherence. But whining about it in the context of how it makes this thing you felt like doing slightly more difficult is just petulant noise.
Just like open source developers don't owe anyone specific features or bugfixes, the Unix and C ecosystem doesn't owe anyone a rigorous, provably correct ABI. If you're not happy about it, don't use it. If you want to benefit from the decades of work, you have to accept the pitfalls that sprung up over that half-century. The alternative is to do it over, yourself, according to your principles. I desperately wish someone would; the benefit to computing would be incredible. So far, everyone dips a toe in the FFI pool, declares it to be too cold, and then swims around in it regardless.
It's not exactly ignored in C itself. That's what things like the sizeof() operator are for and standard macros like INT_MAX and INT_MIN.
The problem here is once compiled this information is not readily accessible without being able to read a C Header. That's why the author here mentions parsing a C header and says most people just end up letting a compiler do that.
If your writing C and ignoring things like INT_MAX or the minimum range requirements specified by the standard or not using sizeof() when your worried the memory size of these base types. Your code is not going to be portable.
C definitely let's you handle and know the domain of your variables.
I'd say the author, although they are right in most of their arguments, just want C to be a higher level language, well defined, well supported by every OSes out there in the same way.
I'd hate to see that happen, and I write C every day, read hundreds of lines of C everyday, and actually formally prove C everyday. C is not meant to be Rust or Go or Java. I'm actually against most of additions those days to the new standard. They complicate things, the (barely) portable assembly it's meant to be. I wouldn't write a database in C today, but I'd be happy to start a new bootloader project in C tomorrow.
The author is actually angry that the OSes still expose a C interface when they should use something higher level, like it seems Apple is doing (I may be wrong). I get that, but like we say in France, don't discard the baby with the bath water. The problem is that, as they say clearly, C is the lingua franca. C isn't the problem, its use is.
Oh god Apples platform is a mess!!! I would hate to see more of that. If they think C is bad. Try interacting with for instance objective C at a low level is even worse for functionality provided by the OS.
The C Programming Language. Wow - a throwback in time. I learned to program using the 1st (1978) edition. Haven't touch C since 1994.
I'll rephrase title as "C Isn't a Programming Language that I use Anymore". But I have great memories of that era of my hacking career.
Would it be possible to create a binary library format that declares its own ABI? Has anyone done something like that in the recent past?
Not like GObject declaring the interface in a special comment (modulo typos). Something that's fully machine-parseable in any programming language and does not require access to the source code.
I mentioned C2FFI (https://github.com/rpav/c2ffi) in another post in this thread. That's an extra "spec.json" file that you need to distribute along with your library, but maybe that's a whole lot easier and simpler than inventing a new library format. And it's JSON, which is human-readable, so it's relatively easy to debug.
COM! Have you ever tried interacting without some automated tooling or language support! It's horrible and verbose. When it comes to setting up everything to interact with COM, I feel likely I would be more productive in assembly.
Although at least a lot of it can be setup with automated tooling. Although the interface seems to also have a lot more overhead than a basic function call. I hardly would call it a better situation.
Of course it is possible. You see, we already have tools that have to deal with binaries and let tell you query value of variables or call functions. They're called debuggers and binaries usually can contain some extra information (either inside the binary or as an extra file) that provides these debuggers with information about the ABI (and more) so that they don't have do parse the source code themselves. All of the big C (and C++) compilers can generate this "debug information".
> C doesn’t actually have an ABI. Or even defined type layouts
This isn't the problem it appears to be. D's ImportC adjusts its semantics to match what we call the "associated C compiler", which is the predominant one on the target platform.
If HP-UX was a supported system by D, and aCC is the system compiler for HP-UX, it would be made to work. ImportC already makes adjustments for Microsoft C on Windows, Digital Mars C, and the dialect of C used on the Mac.
A better title is "C isn't just a progamming language anymore"
I know the purpose of the article is to shit on C, but the fact that it is the lingua franca _despite_ of it's shortcomings, is more than anything a testament to the versatility of the language, and a proof that it somehow ended up the right abstraction distance between javascript and microcode..
Nobody forbids you from making a language that does not speak C, and write a compiler that generates machine-code that has nothing in common with what would have come out of C source, and an OS that natively support whatever conventions that language has. It's not like hardware requires you to write C for it..
Well, nobody forbids you, but its akin to speaking Chinese in England without ever trying to even learn a word used there. If you want to do anything useful you ought to use the OS layer sooner or later, which will be predominantly a C ABI/API.
This guy has no idea what he's talking about.
Yet again another click-baity title, followed by an angry rant, the topic of which has nothing to do with C after all (but readers have to go through an impressive amount of noise to deduct that).
It seems like failing to be strict about defining behavior and expectations under all conceivable circumstances (such as leaving the actual bit length of certain integer types to be "up to the architecture to decide" in order to "remain flexible") is always a long-term design mistake.
Anyone know any good articles, tutorials or books about ABI design? How did we end up with what we have now, what lessons are there to learn? How would you go about designing a new one from scratch? Would this fall more under operating systems or compilers discipline?
I like the _MINIDUMP_HANDLE_DESCRIPTOR example. I write like that. And I don't think keeping compatibility in APIs and ABIs is a language thing. It's more what a programmer as an architect of his program does.
and in fact the VM pretty much has to be written in C (or C++, or you could get away with unsafe rust) since efficient function dispatch (not to mention JIT) are difficult without wildeyed pointer manipulation (you want to emit instructions to somewhere in memory and then jump to them, something inherently unsafe)
> and in fact the VM pretty much has to be written in C
What makes you think that you need C/C++ for this? Compilers were and still are written in many languages, and you can always come up with a compiler written in an "arbitrary" language which outputs e.g. an executable ELF file containing your VM runtime code.
There are a few examples of this in the wild btw, look at e.g. the GraalVM and its SubstrateVM sub-project.
I'm not familiar with the way GraalVM is implemented but I do know that at some point it has to have access to raw memory and registers in order to do JIT. There's no getting around that (and you can't do that with just java).
A compiler is different from a JIT engine since yeah you can just write whatever binary you want to disk, hell you could write an x86 compiler in javascript. but if you want to emit instructions to memory and send the CPU there to consume them, you need a smidge of native code.
We're literally saying the same thing. Java compiled into a native code library for each target with intrinsics that give access to things like CPU registers is just another way of implementing intrinsics in native code
And native code is not C or C++? How do you think the very first C compiler were written? Spoiler, it was bootstrapped with assembly which was bootstrapped with machine code.
> you want to emit instructions to somewhere in memory and then jump to them, something inherently unsafe
Isn't this only for JIT? My understanding was that non-JIT VMs basically function as emulators of a non-existent CPU, so they interpret each bytecode instruction rather than asking the CPU proper to do a jump.
even a bytecode vm has to emit bytecode and then move an instruction pointer around. You can do it without using C pointer manipulation but it will always be less than ideal. This is a nice little blog post about it: https://pliniker.github.io/post/dispatchers/
Someone actually has to implement the web browser / "tech stack". Web assembly, despite it's name, is an interpreted language. It is not magic. Interpreted languages need an interpreter (written in C).
Because, as the article states, the OS interface is C. In order to do anything useful at all in your code, you need to call through the C API of your operating system.
To replace this, you would need another compiled language. It can't be "any other lang" - it needs to be something compiled down to machine code that can be directly executed on your CPU. So you could write a new OS that presents an interface in Rust, or Zig, but never WASM.
I don't think it has to be C, but it can't be any arbitrary language. It has to compile to machine code, and I wouldn't want to implement a VM in a garbage collected language because you need more fine-grained control over memory then those can provide.
That last requirement rules out most languages with any significant community besides C/C++, Rust, Zig, and a few other even lower profile C replacements.
My approach was to write a script which generated functions for all possible signatures, up to N parameters. The functions verified the values they were passed both natively and through NSInvocation.
This flushed out a huge number of weird, undocumented ABI corner cases. For example on PPC, when returning a struct containing a bitfield, sometimes the bits are aligned at the top (MSB) of the register, other times at the bottom, depending on the width of the bitfield! This was just an old gcc bug, but it can't be fixed without breaking the ABI.
The resulting C code was so large it also unearthed a PPC64 compiler bug! PPC jump instructions have a 24-bit immediate offset. If the offset needs to be larger than 24 bits, it's supposed to load the offset into a register and emit an indirect jump. This just didn't work, but nobody noticed, since you'd need to branch past ~33MB of code in a single function to hit it.