Hacker Newsnew | past | comments | ask | show | jobs | submit | Joker_vD's commentslogin

The disadvantage is that a "goto fail" can easily happen with this approach. And it actually had happened in the wild.

Well, unless you're on Windows :D Even on Windows XP Home Edition I could open a million file handles with no problems.

Seriously, why is default ulimit on file descriptors on Linux measly 1024?


Some system calls like select() will not work if there are more than 1024 FDs open (https://man7.org/linux/man-pages/man2/select.2.html), so it probably (?) makes sense to default to it. Although I don't really think that in 2k26 it makes sense to have such a low limit on desktops, that is true.


Eh, generics kinda do introduce a subtyping relation already. It's just that HM's Gen rule of e: σ implies e: ∀α.σ is restrictive enough that this subtyping relation can be checked (and inferred) by using just unification which is quite an amazing result.

Some people just itch to use something custom and then to have to think about it. Which can bring amazing results, sure, but it can also bring spectacular disasters as well, especially when we're talking about crypto.

The article is less about crypto and more about improving UUID (and IDs in general) with small block ciphers. It's a low impact mechanism to avoid leaking data that UUID by design does leak. It also doesn't require a good source of entropy.

...a long-lived HTTPS connection that manages to transfer >700 GiB of traffic, with no disconnects, and presumably has re-keying disabled? An interesting theoretical setup, I guess.

In the end it all boils to a very simple argument. The C programmers want the C compilers to behave one way, the C implementers want the C compilers to behave the other way. Since the power structure is what it is — the C implementers are the ones who write the C standard and are the ones who actually get to implement the C compilers — the C compilers do, and will, behave the way the C implementers want them to.

In this situation the C programmers can either a) accept that they're programming in a language that exists as it exists, not as they'd like it to exist; b) angrily deny a); or c) switch to some other system-level language with defined semantics.


Given what most C compilers are written in, are C programmers also C implementers?

I suspect it also depends on who exactly the compiler writers are; the GCC and LLVM guys seem to have more theoretics/academics and thus think of the language more abstractly, leading to UB being truly inexplicable and free of thought, while MSVC and ICC are more on the practical side and their interpretation of it is, as the standard says, "in a documented manner characteristic of the environment". IMHO the "spirit of C" and the more commonsense approach is definitely the latter, and K&R themselves have always leaned in that direction. This is very much a "letter of the law vs. spirit of the law" argument. The fact that these two different sides have produced compilers with nearly the same performance characteristics shows IMHO that the argument of needing to exploit UB is mandatory for performance is a debunked myth.


I doubt it, but that's just a hunch. Is there data out there regarding compiler/language maintainer/standards committee members' contributions to other projects (beyond "so and so person works on $compiler and $application, both written in C"-type anecdotes)?

If not, then, like ... sure, C compiler maintainers people who program in C, but they're not "C programmers" as it was intended (people who develop non-compiler software in C).

My hunch is that that statement is overwhelmingly true if measured by influence of a given C compiler/implementation stack (because GCC/LLVM/MSVC take up a huge slice of the market, and their maintainers are in many cases paid specialists who don't do significant work on other projects), but untrue if measured by count of people who have worked on C compilers (because there are a huge number of small-market-share/niche compilers out there, often maintained by groups who develop those compilers for a specific, often closed-source, platform/SoC/whatever).


Another alternative is that the programmer write their own C compiler and be free of this politics. Maybe I am biased since I am working on exactly such a project, but I have been seeing more and more in-progress compiler implementations for C or C-like languages for the past couple years.

The proposals for Boring C or "Friendly Dialect of C" or whatever has been around for a while. None went beyond the early design stages because, it turns out, no two experienced C programmers could agree on what parts of C are reasonable/unreasonable (and should be kept/left out), see [0] for the first-hand recount.

[0] https://blog.regehr.org/archives/1287

> In contrast, we want old code to just keep working, with latent bugs remaining latent.

Well, just keep compiling it with the old compilers. "But we'd like to use new compilers for some 'free' gains!" Well, sucks, you can't. "But we have to use new compilers because the old ones just plain don't work on the newer systems!" Well, that sucks, and this here is why "technical debt" is called "debt" and you've managed to hold paying it off until now the repo team is here and knocking at your door.


I can't upvote this enough.

I mostly work in compiled languages now, but started in interpreted/runtime languages.

When I made that switch, it was baffling to me that the compiled-language folks don't do compatibility-breaking changes more often during big language/compiler revision updates.

Compiled code isn't like runtime code--you can build it (in many cases bit-deterministically!) on any compiler version and it stays built! There's no risk of a toolchain upgrade preventing your software from running, just compiling.

After having gone through the browser compatibility trenches and the Python 2->3 wars, I have no idea why your proposal isn't implemented more often: old compiler/language versions get critical/bugfix updates where practical, new versions get new features and aggressively deprecate old ones. For example: "you want some combination of {the latest optimizations, loongarch support, C++-style attributes, #embed directives, auto vector zero-init}? Great! Those are only available on the new revision of the compiler where -Werror is the default and only behavior. Don't want those? The old version will still get bugfixes."

Don't get me wrong, backwards compatibility is golden...when it comes to making software run. But I think it's a mistake that back compat is taken even further when it comes to compilers, rather than the reverse. I get that there are immense volumes of C/C++ out there, but I don't get why new features/semantics/optimizations aren't rolled out more aggressively (well, I do--maintainers of some of those immense volumes are on language steering committees and don't want to spin up projects to modernize their codebases--but I'm mad about it).

"Just use an old compiler" seems like such a gimme--especially in the modern era of containers etc. where making old toolchains available is easier than ever. I get that it feels bad and accumulates paper cuts, but it is so much easier to deploy compiled code written on an old revision on a new system than it is to deploy interpreted/managed code.

(There are a few cases where compilers need to be careful there--thinking about e.g. ELF format extensions and how to compile code with consideration for more aggressive linker optimizations that might be developed in the future--but they're the minority.)


There are C codebases many decades old still being actively maintained and used. I don't think the same is true for Python on the same scale. It's easy to remodel when you are on the top of abstraction layer, but you don't want to mess around with the foundational infrastructure unnecessarily.

Absolutely. But there’s so much more liberty in C land in that you can stay on an old compiler/language version for such codebases.

I know it’s not pleasant per se, but the level of support needed (easier now with docker and better toolchain version management utils than were the norm previously) surely doesn’t merit compilers carrying around the volume of legacy cruft and breaking-change aversion they do, no?


And please provide feedback to WG14. Also please give feedback and file bugs for GCC / clang. There are users of C in the committee and we need your support. Also keeping C implementable for small teams is something that is at risk.

Myself and other developers I know have tried giving feedback for gcc. On the whole, going outside and shouting at clouds is more productive.

I felt the same. There are too few contributors for GCC. At some point I started to fix the bugs that I had filed myself. Still, it is important that the user make themselves heard.

I think it's a circular problem, the gcc developers are very insular and respond to outside input with anything from ignoring it to getting into long lawyeristic arguments why, if you squint at the text just right, their way is the only right way, which strongly discourages outside contributions. There's only so many hours in the day and arguing till you're blue in the face that silently mutating a piece of code into unexpected different code that always segfaults when run based on a truly tortured interpretation of two sentences of text gets old fast. The gcc devs would make great lawyers for bypassing things like environmental law, they'd find some tortuous interpretation of an environmental protection law that let them dump refinery waste into a national park and then gleefully do it because their particular interpretation of the law didn't prohibit it.

Contrast this with Linus' famous "we do not break userspace" rant which is the polar opposite of the gcc devs "we love to break your code to show how much cleverererer than you we are". Just for reference the exact quote, https://lkml.org/lkml/2012/12/23/75, is:

  And you *still* haven't learnt the first rule of kernel maintenance?  If a change results in user programs breaking, it's a bug in the kernel. We never EVER blame the user programs. How hard can this be to understand?  ... WE DO NOT BREAK USERSPACE!
Ah, Happy Fun Linus. Can you imagine the gcc devs ever saying "if we break your code it's a problem with gcc" or "we never blame the user?".

This really seems to be gcc-specific problem. It doesn't affect other compilers like MSVC, Diab, IAR, Green Hills, it's only gcc and to a lesser extent clang. Admittedly this is from a rather small sample but the big difference between those two sets that jumps out is that the first one is commercial with responsibilities to customers and the second one isn't.


In my experience it is worse with clang that even more aggressively uses UB than GCC to optimize (and Chris Lattner in his famous blog post very much justified this line of thinking), and I have seen similar things with MSCV. I do not know about the others.

I think that GCC changed a bit in recent years, but I am also not sure that an optimizing compiler can not have the same policy as the kernel. For the kernel, it is about keeping API's stable which is realistic, but an optimizing compiler inherently relies on some semantic interpretation of the program code and if there is a mismatch that causes something to break it is often difficult to fix. It is also that many issues were not caused because they decided suddenly "let's now exploit this UB we haven't exploited before" but that they always relied on it but an improved optimization now makes something affect more or different program. This creates a difficult situation because it is not clear how to fix it if you don't want to roll back the improvement you spend a lot of time on and others paid for. Don't get me wrong, I agree the went to far in the past in exploiting UB, but I do think this is less of a problem when looking forward and there is also generally more concern about the impact on safety and security now.


Good point, yeah. I really want to like clang because it's not gcc but they have been following the gcc path a lot in recent years. I haven't actually seen it with MSVC, but I'm still on an old pre-bloat version of Visual Studio so maybe they've got worse in recent versions too.

I think a lot of the UB though isn't "let's exploit UB", it's "we didn't even know we had UB in the code". An example is twos-complement arithmetic, which the C language has finally acknowledged more than half a century after the last non-twos-complement machine was built (was the CDC 6600 the last one's-complement machine? Were most of the gcc dev even born when that was released?). So everyone on earth has been under the crazy notion that their computer used twos-complement maths which the gcc (and clang) devs know is actually UB and allows them to do whatever they want with your code when they encounter it.


How about we agree on the ABI and everyone can have their own C compiler. Everyone C's the world through their own lenses.

We're not too far away from that. At the very least, Claude can provide feedback and help decide which compiler options to use, as per developer preference.

> behave the way the C implementers want them to

If you don't please your users, you won't have any users.


It's ironic that I have to tell you of all people this, but many users of C (or at least, backends of compilers targeted by C) do actually want the compiler to aggressively optimize around UB.

I'm well aware of that. We've had many, many discussions of that in the D forums.

Consider that most programmers have long since fled for other languages.

If you're self hosting your compiler on C, you are your own user.

Which users?

And yet, C++.

> And yet, C++.

By any metric, C++ is one of the most successful programming languages devised by mankind, if not the most successful.

What point were you trying to make?


That it doesn't pleases lots of its users I imagine. I, personally, certainly never enjoyed it but sometimes you don't have a realistic alternative and have to use C++ (or C). In which case your pleasure or displeasure doesn't really matter, you just use that one tool with very sharp edges in the most unexpected (and ridiculously exposed) places with as much care as you could, then bandage your wounds and move on.

that it has millions of users while pleasing approximately none of them

True! But C++ is popular almost entirely because of when (in history/what alternatives existed at the time) and where (on what platforms) it first became available, and how much adoption momentum was created during that era.

I think claiming that C++ is successful because of the unintuitive-behavior-causing compiler behaviors/parts of the spec is an extraordinary claim--if that's what you mean, then I disagree. TFA discusses that many of the most pernicious UB-causing optimizations yield paltry performance gains.


If I may pontificate a bit, I was a major contributor to the success of C++.

Back in the 80s, I was looking for a way to enhance my C compiler. I looked at Objective-C and C++. There was a newsgroup for each, and each had about the same amount of traffic. I had to pick one.

Objective-C required a license to implement it. I asked AT&T if I needed a license to implement C++, and could I call it C++. AT&T's lawyer laughed and said feel free to do whatever you want.

So that decided it for me. At the time, C++ did not exist on the PC other than the awkward, nearly unusable cfront (which translated C++ to C). At the time, 90% of programming was done on the PC.

I implemented it. It was the first native C++ compiler for the PC. (It is arguable that it was the first native C++ compiler, depending on whether a gcc beta is considered a release.)

The usage of it exploded. The newsgroup traffic for C++ zoomed upwards, and Objective-C interest fell away. C++ built critical mass because of Zortech C++.

Borland dropped their plans for an OOP language and went for Turbo C++. Microsoft also had a secret OOP C language called C*, which was also abandoned in favor of implementing C++.

And the rest is history!

P.S. cfront on the PC was unusable because it was 1) incredibly slow and 2) did not support near/far pointers which was required for the mixed PC memory models.

P.P.S. Bjarne Stroustrup never mentioned any of this in his book "The Design and Evolution of C++".


Wow, that's a very torturous reading of a specific line in a standard. And it doesn't really matter what Yodaiken thinks this line means because standard is written by C implementers for (mostly) C implementers. So if C compile writers think this line means they can use UB for optimizing purposes, then that's what it means.

Yeah, I know it breaks the common illusion among the C programmers that they're "close to the bare metal", but illusions should be dispersed, not indulged. The C programmers program for the abstract C machine which is then mediated by the C compilers into machine code the way the implementers of C compilers publicly documented.


Yeah, this is basically Sovereign Citizen-tier argumentation: through some magic of definitions and historical readings and arguing about commas, I prove that actually everyone is incorrect. That's not how programming languages work! If everyone for 10+ years has been developing compilers with some definition of undefined behavior, and all modern compilers use undefined behavior in order to drive optimization passes which depend on those invariants, there is no possible way to argue that they're wrong and you know the One True C Programming Language interpretation instead.

Moreover, compiler authors don't just go out maliciously trying to ruin programs through finding more and more torturous undefined behavior for fun: the vast majority of undefined behavior in C are things that if a compiler wasn't able to assume were upheld by the programmer would inhibit trivial optimizations that the programmer also expects the compiler to be able to do.


I find where the argument gets lost is when undefined behavior is assumed to be exactly that, an invariant.

That is to say, I find "could not happen" the most bizarre reading to make when optimizing around undefined behavior "whatever the machine does" makes sense, as does "we don't know". But "could not happen???" if it could not happen the spec would have said "could not happen" instead the spec does not know what will happen and so punts on the outcome, knowing full well that it will happen all the time.

The problem is that there is no optimization to make around "whatever the hardware does" or "we have no clue" so the incentive is to choose the worst possible reading "undefined behavior is incorrect code and therefore a correct program will never have it".


Some behaviors are left unspecified instead of undefined, which allows each implementation to choose whatever behavior is convenient, such as, as you put it, whatever the hardware does. IIRC this is the case in C for modulo with both negative operands.

I would imagine that the standard writers choose one or the other depending on whether the behavior is useful for optimizations. There's also the matter that if a behavior is currently undefined, it's easy to later on make it unspecified or specified, while if a behavior is unspecified it's more difficult to make it undefined, because you don't know how much code is depending on that behavior.


But even integer overflow is undefined.

It's practically impossible to find a program without UB.


I think this is not really true. Or rather, it depends on the UB you are talking about. There is UB which is simply UB because it is out-of-scope for the C standard, and there is UB such as signed integer overflow that can cause issues. It is realistic to deal with the later, e.g. by converting them to traps with a compiler flags.

> I think this is not really true. Or rather, it depends on the UB you are talking about.

I mean, if you're going to argue that a compiler can do anything with any UB, then by all means make that argument.

Otherwise, then no, I don't think it's reasonable for a compiler to cause an infinite loop inside a function simply because that function itself doesn't return a value.


When you say "cause", do you mean insert on purpose, or do you mean cause by accident? I could see the latter happening, for example because the compiler doesn't generate a ret if the non-void function doesn't return anything, so control flow falls through to whatever code happens to be next in memory. I'm not aware of any compiler that does that, but it's something I could see happening, and the developers would have no reason to "fix" it, because it's perfectly up to spec.

According to the author of the second link I gave (here it is again):

https://www.quora.com/What-is-the-most-subtle-bug-you-have-h...

The problem was that the loop itself was altered, rather than that the function returned and then that somehow caused an infinite loop.

> I'm not aware of any compiler that does that, but it's something I could see happening, and the developers would have no reason to "fix" it, because it's perfectly up to spec.

This is where we disagree.


I am not sure what statement you are responding to. I am certainly not arguing that. I disagree with your claim that "it is practically impossible find a program without UB".

A study found that, for a particular subset of UB (code that had legal, detectable behavior changes at differing optimization levels), 40% of Debian Wheezy packages exhibited this UB.

https://people.csail.mit.edu/nickolai/papers/wang-stack.pdf

I submit that that's a small fraction of UB, that much of it would exist at any optimization level.


I know, but this still leaves 60% of programs without such UB which is far from "it is practically impossible find a program without UB". Also this this was a study from 2013 and many of those bugs found were fixed. Also GCC got UBSan in 2013 (so after this study).

That's "UB that was detected in this study". Since gcc will silently break code when it detects UB and you can't tell until you hit that specific case, the 40% is a lower bound. In practice it could be anything up to the full 100%.

In theory. But most C programs do not rely on UB. What is the basis for your claim?

Uhh... mathematics and logic? Since there's no perfect UB detector, one that detects UB in 40% of programs can only be presenting a lower bound. And I don't know why you think C programs rely on UB, they have it present without the programmer knowing about it.

It follows from mathematics and logic that "larger than 40%" could be 100%, but it does not follow that this is likely or reasonable to assume.

Aliasing being the classic example. If code generation for every pointer dereference has to assume that it’s potentially aliasing any other value in scope, things get slow in a hurry.

> Wow, that's a very torturous reading of a specific line in a standard.

It's actually a much more torturous reading to say "if any line in the program contains undefined behavior (such as the example given in the standard, integer overflow), then it's OK for the compiler to treat the entire program as garbage and create any behavior whatsoever in the executable."

Which is exactly what had been claimed, that he was addressing.


Compiler writers are free to make whatever intentional choices they want and document them. UB is especially nasty compared to other kinds of bugs because implementors can't/refuse to commit to any specific behavior, not because they've chosen the wrong behaviors.

> Compiler writers are free to make whatever intentional choices they want and document them.

Sure, but it's unlikely it's an intentional choice to cause an infinite loop simply because your boolean function didn't return a boolean.


Which is why Windows UI is littered with language like "number of rows: {n}".

Makes it easier to parse by automatic tools too

You know, I regularly lose or forget my baseball caps (at least once per Summer, and usually I go through 2 or 3). I wish there was a nationally-mandated register of headwear, with obligatory chipping at the points of sale. Not even entirely joking.

On a more serious note, it's interesting to note that some property never gets any ownership marks on it, some gets it customarily but only out of convenience, there is no legal obligation to do so, and for some property it is legally-mandated by the state but owners largely find it cumbersome.


For maybe 100 years, we’ve lived in an era of diminished hat importance. I, for one, don’t want to be caught hatless around any sharp-tongued re-enactors.

Isn't public disclosure of military secrets a criminal offence? Ah well.

Not if it’s an official government negotiating ploy.

But it's not Dutch secret, right? It's the USA's one, right? All in all, kinda makes me suspect that statement is simply untrue.

If it’s anything like the code in passenger vehicles or airplanes, it is:

- spaghetti code that’s difficult or impossible to formally exercise fully in unit, comprehensive, or proof-centric testing

- delivered as compiled binaries for industrial-chip architectures by e.g. Renesas that have extremely hardened hardware and resilience

- annoying but feasible to reverse engineer in Ghidra

- designed to prioritize repairability over firmware signature enforcement

- has an undocumented but wire-sniffable protocol for firmware updates

So I am of a mind to take their statement at face value, because it’s vanishingly unlikely that the U.S. disallows field patching of a warplane due to lacking a crypto private key, much less bothers to spend money on crypto-attestation style locks. This is USgov military-industrial, not Bay Area marketer tech à la Google; competent security practices in deployed hardware are not likely to be the norm, especially not when every plane includes armed guards free of charge to the contract.

If I were a competent defense partner with the USgov, I would have already commissioned and complete a full decompilation, because duh. That the Dutch are saying this openly is charming but not particularly surprising. Presumably there’s a US backdoor in the IFF module, for instance, and while it’s fine to leave it in place, it’s better than fine to patch a warning alert in so that you know when it’s exercised. This is basic defense programming 101 stuff here, right? .. right?


> has an undocumented but wire-sniffable protocol for firmware updates

- Has an undocumented blob execution feature used for testing of the unit after it was sealed and glued.

- Has a documented secondary bootloader (remote code execution by design) due to historical reasons.


just how, the dutch would manage to find that out, would be a big deal.

espionage, would be the name of that witch.


Didn't you hear? American laws apply to everyone now. /s

American's are fierce at ramming their laws into throats of others, but when EU says that Parmesan cheese can only come from Italy, they are immediately throwing a hissy fit.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: