1) Never store a length as an int, unsigned, long, etc. Use size_t (or ssize_t) instead.
2) Never store a pointer as an int, unsigned, long, etc. Use uintptr_t (or intptr_t). ptrdiff_t is the type to use for pointer differencing.
3) When comparing or casting to a signed type, watch out for sign extension. For example (short) -1 (0xffff) becomes 2^32-1 (0xffffffff) when casted to uint32_t. Remember char is usually a signed type.
4) Some legacy library functions return int, which is a signed 32-bit value in Windows and Linux (remember point 3). Watch out for these cases. For example don't use ftell and check s[n]printf return value.
It's worth noting that ssize_t is a POSIX type, and is only guaranteed to have the range [-1, SSIZE_MAX] (basically a size type with one error value) [1]. If you need negative sizes, then ptrdiff_t is perhaps better.
Obvious (well, relatively), but alas non-trivial to fix in a codebase of even moderate size. (5k+ LoC, let's say)
Implicit integral conversions are one of the worst things ever foisted upon the programming world... and that's saying something (given the existence of NULL). Well, that and the whole size_t == unsigned <whatever> thing[1].
[1] I can certainly agree that it was useful when it was introduced because it actually doubled your addressable storage in a practical way, but does any existing machine
address even close to 64-bit? For context: 60-bits of addressability is ~1000PB.
The only problem is that there's lots of software written before those types even existed, and even today people might learn a bad style from old code. The problem is that compilers hardly compel one to use a good coding style.
> For example (short) -1 (0xffff) becomes 2^32-1 (0xffffffff) when casted to uint32_t.
Ideally the compiler would do a check during a conversion from a negative signed integer to unsigned integer, and set a flag or throw some sort of exception. Hopefully future languages (e.g. Rust) learn from this.
"The provided string length arguments were not properly checked and due to arithmetic in the functions, passing in the length 0xffffffff (2^32-1 or UINT_MAX or even just -1) would end up causing an allocation of zero bytes of heap memory that curl would attempt to write gigabytes of data into."
I don't now if it fixes it per se, but it does make it harder to mess up. For example, it does not automatically convert between sizes (usize) and unsigned integers (u64).
32-bit is relatively nice because ints, longs, and pointers are all the same size; that was also one of the main motivators in the 16-to-32 transition era, since the relative size of an int and a pointer did not change. The same thing didn't happen with the x86 transition to 64-bit, because it would effectively double the memory usage of every program even if it never needed such widths.
The "bitness" of a platform has always seemed to be a bit of a vague term to me; sometimes it refers to data width, sometimes it refers to address width, and sometimes a combination of both and neither. If we refer to data width, then the 8087 was already "80-bit" (FP) or "64-bit" (integers) and the latest CPUs with AVX512 are "512-bit". If we refer to address width, then the 8086/8088 are "20-bit" machines, the 80286 "24-bit", and P6 "36-bit".
I think a lot of this confusion and the vulnerabilities thus introduced are precisely because the relative sizes of the different types have changed, and exactly how they changed is not consistent across platforms nor clearly known to most programmers.
Well we could have just done ILP64, where all three are the same size, and that would have avoided a lot of 32 bit truncation bugs. The memory size argument never really made sense to me; there just aren't that many ints in the average program. I wonder how much of the push towards LP64 (ints are 32 bits, pointers and longs are 64) was to match Java.
I think this is one place where Swift (among other modern languages) has it right. Int is pointer sized and the other integer types are explicitly sized in their name ("Int64" not "long").
> I wonder how much of the push towards LP64 (ints are 32 bits, pointers and longs are 64) was to match Java.
It was to avoid breakage when porting existing programs.
> I think this is one place where Swift (among other modern languages) has it right.
We used to do ILP64 in Rust and we got a huge number of complaints; there was work by a significant segment of the community to fork the language over it. The memory usage argument is still very important to a lot of people.
>The memory size argument never really made sense to me; there just aren't that many ints in the average program.
If you have a structure of just 5000 ints (which I imagine is fairly common), the difference between ILP64 and LP64 is the difference between your data fitting into the L1 cache and some of your data being pushed out into the L2 cache. In a thight loop that can be a big difference.
While it doesn't look like it's true anymore (http://www.oracle.com/technetwork/java/hotspotfaq-138619.htm...), I sure seem to remember the Hotspot JVM having an option for 32 bit pointers for ... code, I think it was. Since few apps would need more than 4 GiB of code. And as you note, as the FAQ notes, this doesn't change much for most programming.
If your heap is under 32G Java uses pointer arithmetic to fit pointers into 32 bits. "UseCompressedOops" is your google term. The reduction in cache misses offsets a lot of the overhead.
That's not an option in practice, since the whole point of going 64 bit is to get larger address spaces. You could define a special purpose ABI with 32-bit pointers and 64-bit integer types (and people have done so). But doing it for the platform's standard ABI would be just commercial suicide.
> If we refer to address width, then the 8086/8088 are "20-bit" machines, the 80286 "24-bit", and P6 "36-bit".
Also MOS 6502/6510 would have been "16-bit" by that standard.
Some ARMv7 designs would be "40-bit" (like Cortex A7 and A15, probably others too). Or alternatively "128-bit", if considering NEON SIMD width! You could have also said they're 16/32/64/128-bit, because some designs had 16/32/64/128 bits wide memory bus. Yet all ARMv7 designs were considered 32-bit.
And the 68000 a 24-bit CPU, and boy did that cause trouble when Apple et. al. moved to later versions where those top 8 address bits were significant, especially since the 68000 was introduced when DRAM was really dear, e.g. the barely usable first Macintosh had only 128KiB, i.e. a program had essentially less address space to use than a split I&D PDP-11, which gave you 64KiB of each, with 8KiB of data space reserved for the stack. Less, because it's macroarchitecture is 32 bit, "wasting" bit and bytes here and there if your DRAM is that constrained.
I've read that catching stuff like this is a major reason that OpenBSD maintains (e.g.) SPARC64 support. Due to differences in MMU architectures, addressing modes, alignment requirements, etc., this kind of problem will often end up throwing an exception rather than silently doing almost-the-right-thing-usually.
Yep, here's a brief overview of that from 2007 https://youtu.be/NJ9Jml0GBPk?t=1937 . SPARC64 is a favorite of this developer because it's some Alice in Wonderland stuff where up is down and left is right compared to other architectures. So it exposes a number of bugs that others don't.
Bugs exposed by moving to different architectures are fascinating. They always originate from assumptions that just happen to work on the architecture they were designed on. But when you go to a different arch they go from being sort of harmlessly incorrect to crash and burn type of incorrect. Or, infinitely worse, a sort of works but not in the way you expect it situation.
Might seem stupid, but something like that can play a big role in memorability. I don't think we'd all remember "Smashing the Stack for Fun and Profit" by name if it didn't have such a delicious title.
"Double [the] trouble" is a somewhat common idiomatic phrase. It rhymes "double" with "trouble", and the word "double" has a pleasant ring to it in its own right.
Some typical points to consider:
1) Never store a length as an int, unsigned, long, etc. Use size_t (or ssize_t) instead.
2) Never store a pointer as an int, unsigned, long, etc. Use uintptr_t (or intptr_t). ptrdiff_t is the type to use for pointer differencing.
3) When comparing or casting to a signed type, watch out for sign extension. For example (short) -1 (0xffff) becomes 2^32-1 (0xffffffff) when casted to uint32_t. Remember char is usually a signed type.
4) Some legacy library functions return int, which is a signed 32-bit value in Windows and Linux (remember point 3). Watch out for these cases. For example don't use ftell and check s[n]printf return value.