It does say bitfields of size zero are allowed (that forces padding to the size of the allocation unit), and also, curiously, says that, in C, “whether int bit-fields that are not explicitly signed or unsigned are signed or unsigned is implementation-defined.”
True. And that still doesn't mean the behaviour is undefined, that just means the construct is not useful except on two's complement implementations. Which, even before C23, was all of them: the C23 change to require two's complement wasn't really meant to invalidate existing implementations, it was meant to reflect the reality that there were no other implementations worth considering. See https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm, the only non-two's-complement implementation still in use that was found was very much a legacy-only thing for backwards compatibility where it was not expected that its users would want a modern C compiler anyway.
even better, adding constructor and destructor for RAII in c, main() has it via attributes __constructor__ __destructor__, gcc has __cleanup__ for non-main() functions too, let's add it to C directly?
I'm in the wg14, and I made a passionate argument against defer. C is all about simplicity and clarity, over conviniance. Defer, is an invisible jump, its much better using a goto, that clearly denotes a change in flow. In C everything should be as clear and as explicit as possible. Nothing should happen without the user explicitly saying so. This means more typing, and thats fine, there are plenty of languages that offer loads of syntax sugar if thats what users want, but only C offers clarity.
Thank you so much for your work! Your comment made me happy and warm inside, as if there's still some order in this crazy world. As if there's still a few sane and robust things that we can attach ourselves to, knowing that they will not break.
How can we support/encourage you? (in case you need it)
Did you get rid of longjmp() too? longjmp() is far more perilous for similar reasons and purposes.
Defer is useful for code clarity in many situations IMHO, and it's easy enough for a project or code QA tool to ban its use for those that don't like it.
If C is simple and clear why is it fraught with pitfalls? While C code can be nice and clear to read that doesn’t mean there less mental burden on the developer to keep track of everything going on.
I would argue that there is less mental burden, if you stay within some simple limits. C has a lot of complexity if you push its limits, mainly because it is so old and the limits have been pushed in ways they havent in other languages. C has to run on very strange platforms and has many many implementations, and so much code depending on it. This makes it very hard to maintain. If two large C compilers do things slightly different and very important software depends on those behaviours, then its very hard to make a "Clean" fix without breaking lots of software.
The argument against "Why dont they just fix X?" usualy commes down to: it would break millions of lines of code, make every C tutorial/book obsolete and force millions of C programmers to learn new things, not to mention that if we broke the ABI, we would break almost every other language since they depend on Cs stable ABI. Breaking C would literaly cost tens if not hundreds of billions of dollars for the industry.
Look at the move between Python 2 and 3. The cost of breaking bakwards compatibility have probably been astonomical, and then there is way less Pyton code being maintained then C code.
C operates on a scale that is almost unfathomable. A 1% perfomance degredation, in C will have ameasurable impact on the worlds energy use and co2 emissions, so the little things really matter.
C programmers claim some things that are simply not true:
1) simplicity - C is not a simple language, it is a language lacking advanced features which does not make it simple to use but it might be easier to create incomplete non-performant C compiler than compiler for better languages
2) performance - C naive/straight forward/clear implementations are not the most performant and most of the C programs are not highly optimised for performance, C compilers have had so much optimisation work done on them that they can actually generate decently performant code but if better languages received same level of effort as what C had they would be able to achieve better performance than C with smaller and safer code
3) clarity - C is not clear to read once structs and pointers are used and specially once it is optimised for performance...and then there is the preprocessor...which is a whole different language that is required to make any non-hello-world program even possible
If I have to provide examples for any of the above you are either not a C programmer or you are an "advanced" C programmer that doesn't have to deal with large code bases and instead deals with small pieces and someone else takes care of the rest.
I mean, just try to figure out what long int is on Linux without compiling a program and tell me that C is simple and clear language...and if you are still clinging on that lie then tell me what a struct looks like in memory so I can interface to it from another language...or even another C compiler...because C is simple and clear, right?
Not the OP, but you are conflating two very different ideas under a singular definition.
C is "simple" in the same way DNA is "simple". There is a small set of very straightforward rules, but that set provides immense flexibility. But it in no way means that any resulting object will not be complicated. Perhaps a better word than "simple" would be "non-complex", making the distinction between "complex" vs "complicated" systems.
By contrast, the features you refer to would increase the "complexness" of the language (I won't even dare use the grammatically correct word "complexity" here, lest we deviate into yet another trap of conflating definitions).
Which, I would agree with OP is, for better or worse, probably far outside C's goals as a language.
This has nothing to do with C, and everything with its intended problem domain.
You can get into pointer and memory errors in C++, Rust and Ada. All of them low-level system languages. Sure, those errors might be harder to produce, but not impossible, and definitely easy enough to still trip you up.
I programmed in all of those languages, except Rust (just don't like it). At least in C you pretty much now WHY (not necessarily where in the code) things went south, without consulting a thousand page specification or having to remember the myriad of language feature interactions that could have triggered those problems.
Moreover, C being small, it's a good on/off language. Try doing a code review for a C++/Rust/Ada code base which uses features heavily after not having touched the language for a year. I bet it is not as easy as C.
You know, some things in life are just hard. And low-level programming is one of those things. C is only honest about this.
> Nothing should happen without the user explicitly saying so.
Umm, why is typing something like defer (and documenting what it does) different than typing something like goto (and documenting what it does)?
Frankly, not being comfortable with adding safe and efficient abstractions to a language will result in the death of the language (both for spoken languages and also for programming languages).
> Umm, why is typing something like defer (and documenting what it does) different than typing something like goto (and documenting what it does)?
Because goto is explicitly declaring a control flow where defer causes implicit control flow in code that does not explicity declare it. That's a meaningful difference that the OP was trying to specifically avoid regardless of whether you care about it.
Because the control flow happens implicitly at the end of the block, not explicitly as with goto, loop, if-statements, etc. A deferred statement can also have arbitrary time and space complexity, so it's not like a variable going out of scope at the end of a block which is constant time. It furthermore requires invocation of all deferred statements while unwinding the stack upon calling exit(). See the details here:
> Because the control flow happens implicitly at the end of the block
Why is this different than any loop in C? or function without a return? Wouldn’t it be more explicit and very minimal overhead to instead always use a goto at the end of a looping block?
Even tho C allows while loops (as a mistake - all control flow should be explicitly defined at the end of the block), why aren’t you - as a best practice - using an explicit goto and if block instead of while / for?
> Why is this different than any loop in C? or function without a return?
Because it's non-local control flow. Did you just ignore my whole comment where I pointed out where cases like exit() unwinds the stack, invoking deferred block in each caller up the call chain back to the root? Does for/while do that?
I really did read it, I just really believe that your biases are clouding your critical thought about this.
> invoking deferred block in each caller up the call chain back to the root
Yes if the caller writes N defers in the same function or across M functions then the code will run N defers.
Why is this any different than nesting N loops, in the same function or across M functions? It’ll still run N end of block control flow statements without them being “explicit”.
At the end of the day “explicit” typically is really just a replacement for saying “familiar with the previous documentation, don’t change things”.
Anyways, that’s all I can put into this conversation. Please just try to consider that you’re conflating familiarity with simplicity and leverage System 2 to really ask yourself if what you’re favoring is valuable to the group or just the NIMBY.
I honestly can't summarize the extra runtime complexity any more clearly than I already have. The original poster was very clear about keeping the runtime complexity down for various well-motivated reasons. They are correct that if you want more runtime support for high-level features, then C is perhaps not the language for you.
> even better, adding constructor and destructor for RAII
Constructors are a nuisance, and destructors while great require significant additional semantics to avoid double destruction.
C++ uses unconditional destructor thanks to ubiquitous copying and non-destructive moves, but that requires all objects to always be in a destructible state.
Rust instead relies on lexical drop flags inserted by the compiler if objects are conditionally moved.
They’re more convenient and composable than defer/cleanup/scope, but they also have more langage impact.
Obviously the trade off is defer pushes that issue onto the developer.
I suspect `defer` is the mainstream right tradeoff between the implicit nightmarish semantics of C++ and the formalism of rust.
I think the Rust view of C++ will turn out to be an over-reaction. The issue wasn't a failure to make the implicit explicitly provable, it was simply to make it explicit at all.
> I suspect `defer` is the mainstream right tradeoff
It’s not. It’s a conveniently simple language addition which foists all the issues and edge cases onto the language user. Defer is a limited subset which is more verbose and more error-prone.
However like “context managers” (e.g. try blocks, using, with, unwind-protect, bracket) it also works acceptably in languages where ownership concepts are non-existent or non-encoded, whereas destructors / drops require strict object lifetimes.
> The issue wasn't a failure to make the implicit explicitly provable, it was simply to make it explicit at all.
What, pray tell, would be the use of making borrows “explicit” then ignoring them entirely? Useless busy work?
There are three "dialects" of rust: borrow-checked, unsafe, and copy-everything.
The existence of these three "dialects" goes to my point that the borrow checker isnt The Ideal solution to the problem of high-perf. safe static computing.
Some of this data may be difficult to collect, because some project leads evaluate Rust, discover that the entire ecosystem is built on top of libraries that endlessly thrash a global allocator with synchronization primitives in it, and move on.
A sane defer-like mechanism would be fine with me, but please no destructors (or worse) constructors, they are much too inflexible because they bind 'behaviour' to types. Data should be 'stupid' and not come with its own hardwired behaviour, otherwise this becomes a design-rabbit-hole that eventually ends up in C++.
While I agree that destructors are unsuitable for C, that’s because for them to be useful you’d have to throw out and redo the entire standard library, so they can’t be usefully retrofit in the language, however
> they are much too inflexible because they bind 'behaviour' to types
1. that is the baseline you want, data is not an amorphous and meaningless blob, the default for a file or connection is that you close it, the default for allocated memory is to release it, etc…
2. that also allows much more easily composing such types and semantics, without having to do so by hand
3. and is much more resilient to future evolution, if an item goes from not having a destructor to having one… you probably don’t care, but if it goes from not having a cleanup function (or having a no-op one which you could get away with forgetting to call) to having a non-trivial cleanup function you now have a bug lurking
4. destructors also make conditional cleanups… just work, the repetitive and fiddly mess is handled by the computer, which is kind of the point of computers, rather than having to remember every time that you need to clean the resource on the error path but not on the non-error path, defers easily trigger double-free situations; this also, again, make code more resilient to future evolutions (and associated mistakes)
5. furthermore destructors trivially allow emulating defers, just create a type which does nothing and calls a hook on drop
> Data should be 'stupid' and not come with its own hardwired behaviour
That’s certainly a great take if you want the job security of having to fix security issues forever.
> ...the default for a file or connection is that you close it...
Something like a file or connection is already not just 'plain old data' though.
In my mind, 'data' are the pixels in a texture (while the texture itself is an 'object'), or the vertices and indices in a 3D mesh (while the 'mesh' is an object), or the data that's read from or written to files (but not the 'file object' itself).
C is all about data manipulation (the 'pixels', 'vertices' and 'indices'), less about managing 'objects'.
For objects, constructors and destructors may be useful, but there are plenty cases in C++ where they are not (textures and meshes in a 3D API are actually a good example for where destructors are quite useless, because when the 'owner' of the CPU side texture object is done with the object doesn't mean that the GPU is done with the data that's 'owned' by the CPU side object - traditional destructors can be used of course by delaying the destruction on the CPU side texture object until the GPU is done with the data, but why keep the texture object around when only the pixel data is needed by the GPU, not the actual texture object?
A 'defer mechanism' is just the right sweet spot for a C like language IMHO.
> Something like a file or connection is already not just 'plain old data' though.
OK? But now you need to… have two different mechanisms, when one can do both?
> In my mind, 'data' are the pixels in a texture (while the texture itself is an 'object'), or the vertices and indices in a 3D mesh (while the 'mesh' is an object), or the data that's read from or written to files (but not the 'file object' itself).
This “data” does not have a destructor, and thus is not a concern.
> C is all about data manipulation (the 'pixels', 'vertices' and 'indices'), less about managing 'objects'.
I think that would be news to every C program I’ve written.
> there are plenty cases in C++ where they are not
In which case you can just not have them, and not care.
> A 'defer mechanism' is just the right sweet spot for a C like language IMHO.
If by “a C like language” you mean “a language refuses to take any complexity off of the user’s back” then sure.
That link goes directly to the relevant section in the manual.
It includes a cc wrapper called `cedrocc` so you don’t need intermediate files, and it works hard to produce clean code for the generated parts. The rest is not modified at all. The goal is to be useful even if you only use it once to generate that repetitive code.
The pre-processor `cedro` depends only on the C standard library, and `cedrocc` depends on POSIX.
Uh, not to be That Guy, but what did you expect? Just compare K&R [1] and Stroustrup (any edition) [2] next to each other, and you will get a pretty strong hint of C being a smaller language. That's kind of the point, or it used to feel like it was anyhow.
Being a small language is not an excuse. Scheme is a small language, still has dynamic-wind (since R5RS, so no spring chicken). Smalltalk is a tiny language, but has BlockClosure#ensure:.
You can have basic safety and QOL features without making the language a monstrous beast. Hell, you can cut old garbage like K&R declarations or digraphs to make room. You can even remove iso646 and most of string.h as a gimme.
I mean, it's not that I didn't expect changes or didn't expect the language to be smaller - it's just that there definitely things that I would want to bring with me over to C. One thing I really like is RAII - that I can ensure that things are cleaned up in a known, once-defined fashion and I don't have to worry about it everywhere I use a given object. I also generally like using early returns, which is somewhat more complicated with C, as I may need to have more cleanup code around. It can be somewhat mitigated by coding more functionaly and input the necessary parameters to a function, so I can have a different function just doing allocation and deallocation. But still, it's more verbose.
`defer` would to some degree solve that issue.
Similarly, I've been missing nullptr, just for the expressiveness. I like that C23 now includes it :)
Depending on your definition of “using”, you may not actually be able to, at least for now. If CPPReference's table[1] to be believed, there are still important features that are either missing from GCC and Clang or implemented “partially”, which may mean different things for different features and compilers.
In my personal opinion, unless you're doing the project just for fun, it seems better to stick to C11/C17, at least for the next few years.
Default values for structs, only taking effect when static or used with designated initializers, would be good enough imo, since you can always pass a struct to a function.
So you could do foo({.bar=1}) and the rest would be default initialized.
#include <stdio.h>
#include <stdlib.h>
struct params {
int a;
void *p;
};
void
f (struct params p)
{
printf ("p.a = %d, p.p = %p\n", p.a, p.p);
}
int
main (void)
{
f ((struct params){.a = 42});
f ((struct params){.p = f});
f ((struct params){});
exit (EXIT_SUCCESS);
}
Even though these are stack allocated, the missing fields are initialized to 0/NULL. The last case (no parameters) is new in C23. In C99 you had to use (struct params){0} to initialize it. https://en.cppreference.com/w/c/language/compound_literal
Yes but they're default initialized to 0 in this case, whereas it would be nice if there was some way be way they could be initialized to default potentially non-zero values.
No that's just creating an item 'blub' of struct 'bla_t', it doesn't help with default values that the compiler could fill in with when using designated init instead of zeroes. To borrow from your example:
The missing designated init item blub.hello would now be initialized to "Hello World!" by the compiler by looking up the default value in the struct declaration. Currently, missing designated init items are set to zero. And it would work in any other place where a bla_t is created:
struct bla_t blob = {};
This would initialize blob to its default state of blob.a = 23 and blob.hello = "Hello World!".
That's a fair point of course, but when you pass 'blub' into a library function, this function needs to fill in the zeroes with default values anyway, and that's just as opaque, and it causes extra runtime overhead which wouldn't be there if the compiler already knows the default values.
...because C99 allows designated initializers to show up multiple times, but that's all a bit too much macro magic for my taste, I'd really prefer the defaults in the struct declaration.
The point is that foo() now needs to check every struct item for being zero and use the default value instead. If the default values would be listed in the struct declaration, the compiler could fill those in when tmp is created, without any additional runtime overhead.
It would also free up zero as being an actual value instead of standing for 'default value'.
If you want your structs to be zero initialized, then you just wouldn't declare default values in the struct declaration, and everything would work as before.
Also, once you initialize structs with values somewhere in the code (for instance with designated init), there's a high chance that the compiler will put a copy into the data section anyway, which is then memcpy'ed into the runtime struct (it depends on the compiler and compile options).
Wouldn't that be just as hard as for any other function? The name of a function (a bit like the name of an array) evaluates to basically a function pointer value, there is little difference between the two calls here:
int foo(int a, int b)
{
return a + b;
}
int main(void)
{
printf("Direct: %d\n", foo(1, 2));
int (*ptr)(int, int) = foo;
printf("Indirect: %d\n", ptr(1, 2));
return 0;
}
of course default arguments, if added, would have to be part of the function pointer type as well, making the above:
int foo(int a = 1, int b = 2) // NOT REAL CODE, FANTASY SYNTAX
{
return a + b;
}
int main(void)
{
printf("Direct default: %d\n", foo());
int (*ptr)(int a = 1, int b = 2) = foo; // NOT REAL CODE, FANTASY SYNTAX
printf("Indirect default: %d\n", ptr());
return 0;
}
Unnamed function arguments would look silly (`int (ptr)(int = 1, int = 2)`?), but I would be radical then and only support default argument values for named arguments, probably.
Edit: fixed a typo in the code, changed in-code comment.
I see the point. Out of curiosity, I thought up some approaches:
- disallow making function pointers of functions with default params
- require explicitly passing all params when used as a function pointer
- generate a trampoline function that generates the default params and use that as the function pointer
- somehow include the default param values as part of the function pointer type
> Where the compiler puts on stack the default value, whenever the function call doesn't include it.
So if a dynamically linked library uses default values, and you make use of them, and the dynamically linked library decides to change its default values (e.g. a crypto library switches to more secure defaults), you don’t get the update until you recompile your own code.
That doesn’t sound great.
Isn’t it C# which uses this strategy, of embedding the defaults in the caller?
PS: I don’t think C is defined in terms of stack, and modern calling conventions use registers for at least the first few arguments.
That's IIRC how C++ also does this, so... yeah, it's not great but lots of things are not great about C and/or C++ and the standard advice is "yeah, don't do that if that hurts".
This makes only sense when it comes with named parameters which can be provided in any order, not the half-assed default-parameter implementation from C++.
Unix/Posix time doesn’t include leap seconds (it’s 86400 seconds per day, always, per definition [0]), whereas UTC, taken as a count of actual seconds, does include leap seconds. So you are right that converting between them requires additional data. However, few applications care about converting between Unix time and that notion of UTC, but instead care about converting between Unix time and calendar dates and times of day and time zones, which is easier with Unix time than with a UTC seconds count. So I think you’re mixing things up here.
It would be more accurate to say that Unix time "accounts for" leap seconds by not accounting for them at all, but rather switching into temporal displacement ambiguous repeat timestamp la la land for an entire second before returning to an unambiguous encoding of UTC.
Leap seconds are not counted in Unix time. Otherwise days with leap seconds would be counted as having 86401 seconds instead of 86400 seconds. This is not the case. In the example you cite, the Unix time resets to 915148800 after the leap second (1999-01-01T00:00:00), so the leap second isn't counted. Unix time differs from the true number of seconds since the epoch by the number of leap seconds (plus the UTC-TAI time difference between 1970 and 1972, before leap seconds were introduced).
Most likely. They could add leap minutes, hours, or days instead. But they'll probably set UTC as a constant offset from TAI, and let the few uses that care about accurate solar time use UT1.
If you want something tied to Earth's rotation, UT1 is the way to go. If you don't care, TAI is good. UTC is in a weird compromise position, where it (traditionally) tried to be within 1s of UT1 but otherwise tick at the same rate as TAI. That compromise turned out not to be what anyone needs, so future uses will probably pick between UT1 and TAI/GPS/Unix/some other fixed offset from TAI.
The next version of C should redefine C as a strict subset of C++. That is, at any given moment, a particular revision of C should be a subset of a particular revision of C++. Each new version of C++ would then cause a revision of C to be released.
If anything, C++ could have decided to become a strict superset of C - they wouldn't even have to change too much and could do it without giving up backwards compat.
C would have to change significantly to become a proper subset of C++, and pretty much no C program would still compile (because of int* c = malloc(sizeof int);).
Even if you're OK with breaking every call to malloc(), as the other poster said, plenty of C code has variables named things that clash with C++ reserved words, such as new, and, while it's possible for programming languages to simply not have reserved words, I think C++ implementation developers and programmers would rather avoid the kinds of things clever C++ programmers would come up with were those guide rails to be taken off.
> Identifiers can be PL/I keywords or programmer-defined names. Because PL/I can determine from the context if an identifier is a keyword, you can use any identifier as a programmer-defined name. There are no reserved words in PL/I. However, using some keywords, for example, IF or THEN, as variable names might make a program needlessly hard to understand.
Please let's not bring all the design warts of C++ into C (instead only port the actually good ideas over, after they've been proven to be actually good ideas).
It would make more sense if C++ would reverse direction and become a superset of C (like ObjC choose to do from the start), but for C++ it's much too late now.
Can't be done: it would break backwards compatibility. E.g., `struct foo` and `foo` are the same type in C++ automatically (as if `typedef struct foo foo`), but not in C.
I write C every now and then and couldn't care less about C++. So the question is why, what would be the benefit? Do you really think that everyone who use C uses C++ as well?
If you told me that you were making use of templates I would have belived it.
By the way, the C projects I used to work on 20 years ago, took an hour to compile, and it wasn't worse thanks to ClearMake sharing of object files.
In any case, even if templates make C++ slower to compile, I would rather have slower builds than more opportunities to keep security researchers busy.
I wonder why they’re disallowing _BitInt(1). It would be a signed integer type with the possible values -1 and 0.