Somewhat unrelated question, but I think one of the second most difficult things of learning C for coders who are used to scripting languages is to get your head around how the various scaler data types like short, int, long,... (and the unsigned/hex version of each) are represented and how they relate to each other and how they relate to the platform.
I am wondering if this complexity exists due to historical reasons, in other words if you were to invent C today you would just define int as always being 32, long as 64 and provide much more sane and well-defined rules on how the various datatypes relate to each other, without losing anything of what makes C a popular low-level language?
>if you were to invent C today you would just define int as always being 32, long as 64 and provide much more sane and well-defined rules on how the various datatypes relate to each other, without losing anything of what makes C a popular low-level language?
You'd lose something because those decisions would be impractical for 8-bit and 16-bit targets (which still exist in the world of embedded programming).
If you’re writing code for those why not just use the smaller data types if you don’t need bigger ones? That way it will work efficiently on both and the behaviour will be consistent
Fair point. I guess the issue is with library code that uses int. But you aren't typically going to use lots of general-purpose library code if you're targeting a microcontroller.
If that library code doesn’t use the smaller types then it probably isn’t designed for those platforms, which means it won’t be tested with the smaller int types and they will likely cause lots of bugs. The one advantage I do see is that it might avoid extra sign extension instructions on some architectures when working with less than the native size, but there’s int_fast16_t for that
Would it be possible to create two versions of the language: microC for embedded C programs where these decisions matter, and standard C for PC/servers which basically all use the same representation for these scalers?
My main point is that a C programmer today is forced to learn dozens of rules just to cater for many niche platforms that they will probably never target, so if you were to separate those two use cases you can get a much more neat C that targets modern 64 bit architectures with all the power of traditional C but a bit less portability.
In fact, the default integer promotions together with the requirement that unsigned int hold numbers until at least 2^16 is the main reason C code for 8-bitters is somewhat... stylized, to put it mildly.
I think that's actually an interesting example of how C sees itself as a high level language (from the perspective of the early 70s), even though we now categorize it as low-level. You'd expect the default integer type in a high level language to be able to store values that are large enough for an interesting range of practical tasks; 16 bits is arguably the bare minimum to meet that requirement.
IIRC a retrospective by one of the designers listed the default promotions as a careless mistake, because the original—offhand—reasoning was that the registers are 16-bit anyway, so they might as well widen everything when it’s free (on the only machine the language was implemented on at that moment).
The int was supposed to be the native word size. So 16-bit on 286 and earlier, 32-bit on 386 and later, and 64-bit on x64. Except, of course, int has been 32 bits for so long on x86 (which has been the single most important ISA for just as long), and short was 16-bit for even longer that moving to 64-bit-wide int and 32-bit-wide short (which is what x64 naturally suited for) was just impossible, so it didn't happen, we're stuck with LP64 (on Linux) and LLP64 (on Windows) data models.
The simple version is that there are two use cases - the world where you want the size of types to match the target (e.g. int) and the world where sizes are defined by the coder (uint32_t). You want to handle both of those.
That's a nice theory and is what we've got, but it falls down in a few places.
The first is that the "int" world has got a bit munged - some platforms make some slightly strange choices for long and short and so you can't always rely on it (although int is usually pretty sensible).
The other is that when doing unsigned maths, rollover is silent so generally you really need to know the exact size at coding time so that you can ensure that rollover doesn't happen silently.
Together, these mean that you're generally just better using uint32_t (etc.) all over the place and you get more predictable results.
I learnt C about a decade ago (after using scriping languages 10 years prior) and just stuck with using the uint values, no second thoughts about how big a uint32_t is.
I am wondering if this complexity exists due to historical reasons, in other words if you were to invent C today you would just define int as always being 32, long as 64 and provide much more sane and well-defined rules on how the various datatypes relate to each other, without losing anything of what makes C a popular low-level language?