> ...another pass might notice the INT_MAX+1 and replace it by unreachable since...

gpm · 2025-02-02T17:15:00 1738516500

For what it's worth rust has made all integer arithmetic well-defined unless you explicitly opt in to the alternative on a case by case basis. Opting into the alternative is not something I've seen people having to do for performance.

Specifically in release builds it's defined to be 2s complement arithmetic. In debug builds it's defined to panic (in non rust speak that is basically throw an exception) on overflows, and the language theoretically reserves the right to change the behavior on release builds to do the same, but that hasn't been done because the exception route does actually incur a non-negligible performance penalty.

steveklabnik · 2025-02-02T17:25:39 1738517139

I agree with you that I don't see people opting in for performance, but it's also the case that C relies on this behavior a lot more than Rust does, since it doesn't have iterators. If more Rust programmers wrote "raw" for loops, maybe the overhead would be more significant. I don't know if there's any way to qualify this.

Also, it' not done by default because it's believed to incur the penalty. We didn't try to do any significant analysis here, mostly relying on the widespread belief. If we wanted to go against the grain, we'd have to demonstrate that it's incorrect, and that's difficult, time consuming, and there's other things to spend time on.

Google recently demonstrated pretty convincingly that bounds checks on array access in their C and C++ code has a very minor penalty that's absolutely worth it. I wonder if maybe we'll see something similar with this semantic in the future. I think it's lower priority because it's not a safety issue on its own to have the "maybe this, maybe that" behavior, since it's not undefined behavior.

tialaramex · 2025-02-02T18:13:34 1738520014

I would imagine it wouldn't be too hard to collect this information because you can just change the setting on a project and report back whether that's no significant difference, broken (indicating your Rust is wrong, you should fix that) a little slower or unacceptably slower.

Actually that information might help inform people choosing this setting for a new project. I think many assume that Wrapping is the conservative choice but actually the Strict / panic semantic is going to be better for at least some of those people if they could afford it and so if they knew that say 60% of projects find this works out they might try it and benefit.

As a side effect a handful of people get to find out that their project has a nasty bug where it actually relies on wrapping for routine functionality but does not correctly use a wrapping type.

gpm · 2025-02-02T18:37:45 1738521465

Also, for what it's worth, it's easy to change, if you're building with cargo (basically everyone does)

    [profile.release]
    overflow-checks = true

There have been a few projects where I've set this for one reason or another (e.g. because I'm dealing with intentionally wrapping arithmetic and I want the program to shout if I make a mistake).

uecker · 2025-02-02T18:14:21 1738520061

I use signed integer because they give me nice errors on overflow, instead of hideously difficult to find wrap-around bugs.

saurik · 2025-02-02T21:31:23 1738531883

Just in case you are saying this to defend why you are using "int" instead of "size_t" (as, otherwise, I am not sure of your point and a I'll hope you might clarify it), you would then want to use "ssize_t"... the issue I was referring to is not about signed vs. unsigned: it is about using a type which is smaller than a native full-width machine number type (such as when int is 32-bit on a 64-bit machine), as you then have to use suboptimal instructions or even add more code to emulate the wrapping semantics.

uecker · 2025-02-02T22:29:17 1738535357

I understood your comment to be about well-definedness of wraparound and my point is that I do not want it to be well-defined for signed integers because it would take a tool away from me for catching bugs. This is true independent of the size of the type. For similar reasons I use int and not ssize_t because I want an error in my lifetime if there is a bug that causes my loop not to terminate. (And yes, sometimes int would be too small.) Otherwise, there are many examples where signed integers lead to better code, here is one: https://godbolt.org/z/szGd1864a

saurik · 2025-02-03T19:03:21 1738609401

I am sorry, I am missing something here... are you saying that you are using a smaller type "int" so that it wraps sooner? AFAIK, the compiler is going to use the fact that wrap is undefined to promote that int to a native type and make it not wrap until the same place ssize_t would!

OK, and yeah: based on your godbolt example, I do feel like you still don't understand all the ramifications, as you are relying on a bunch of undefined behavior to make your loop sort of work just because you want to avoid working with field arithmetic. Fixing the code is just as fast:

    int f(unsigned long n)
    {
        int sum = 0;
        for (unsigned long k = n; k != n + 4; ++k)
            sum += k;
        return sum;
    }

    f(unsigned long):
            lea     eax, [6+rdi*4]
            ret

(Of course this is still a bit awkward, as this function should also clearly return an unsigned long. I am just trying to show you the minimal patch to the loop to demonstrate why you aren't buying anything... there is no benefit to this quirk of the compiler: please fix the code.)

(edit: BTW, I just looked at your about, and the fact that you are or were on the C WG is demoralizing, as I'd have thought this kind of confusion would be reason to rid this behavior from the spec... but C developers really seem to love doing signed inequalities on field elements :/.)

uecker · 2025-02-03T22:24:07 1738621447

Yes, I want "int" to wrap sooner, except not that it wraps but to get an error message to find the bug.

Your use of the term "field" might indicate that you are confused a bit.

That you can rewrite the example seems irrelevant to the point.

saurik · 2025-02-04T04:56:54 1738645014

But using int doesn't wrap sooner or cause any errors at all, because it is automatically promoted to the native size due to this all being undefined behavior. Your example even directly demonstrates that, as the code had no chance of erroring in any way as it just returned the sum directly... the sum is even overflowing the int it is being stored into, that's how crazy the behavior of the code you wrote is.

And like, in addition to undermining your argument, that is also distressing, right? The code you wrote can be interpreted in a ton of different ways and the only reason the fast one makes any sense is if you squint hard enough so as to lose the details of the sizes and behaviors of all of the types that were declared... if you actually had to make the wrapping or errors you wish to see happen you'd have to have the slow code.

In contrast, the code using a native-sized unsigned type and != means only one thing, and that one thing is the fast code you said wanted to get... that makes it the correct code. It isn't optimized to that code due to undefined behavior: mathematically that lea multiplication is identical, and if you simulated all of the steps--including the wrap (and if we fixed the output type, the truncation)--you'd always get that result.

(And the reason I keep using the term "field" is because these types are elements of a modular finite field, and once you see them for what they are it becomes a lot easier to write correct code using them. There are even some modern languages where the native algebraic type is an element of a prime field, at which point confusing it for some kind of integer is extremely problematic.)

uecker · 2025-02-04T22:45:21 1738709121

Let's start with something simple: Those types are not fields but quotient rings. If you have learned enough math to understand this, please come back.