If you think directly calling Windows syscall is crazy, some applications parse binary code from ntdll.dll to figure out what the syscall numbers actually are, since they change across different Windows versions :)
It's a bit different. WoW64 is the technology used to run 32-bit applications on 64-bit Windows. Wine has support WoW64 for a long time. The difference is the "old" wine WoW64 mode required 32-bit libraries on the host side, whereas this "new" WoW64 mode doesn't.
they employs many devs of Wine (including the project lead, Alexandre Julliard). but technically Wine is still an independent open-source project, and has many contributors from outside Codeweavers.
That's a cool find. I wonder if LLVM also does the other way around operation, where it pattern matches handwritten CAS loops and transform them into native ARM64 instructions.
That's a very good question. A proper compiler engineer would know, but I will do my best to find something and report back.
Edit: I could not find any pass with a pattern matching to replace CAS loops. The closest thing I could find is this pass: https://github.com/llvm/llvm-project/blob/06fb26c3a4ede66755... I reckon one could write a similar pass to recognize CAS idioms, but its usefulness would be probably rather limited and not worth the effort/risks.
The term of art for this technique is "idiom recognition" and it's proper ancient, like, APL compilers did have some idiom recognition 50+ years ago.
An example you'll see in say a modern C compiler is that if you write the obvious loop to calculate how many bits are set in an int, the actual machine code on a brand new CPU should be a single population count instruction, C provides neither intrinsics (like Rust) not a dedicated "popcount" feature, so you can't write that but it's obviously what you want here and yup an optimising C compiler will do that.
However, LLVM is dealing with an IR generated by other compiler folk so I think it probably has less use for idiom recognition. Clang would do the recognition and lower to the same LLVM IR as Rust does for its intrinsic population count core::intrinsics::ctpop so the LLVM backend doesn't need to spot this. I might be wrong, but I think that's how it works.
> An example you'll see in say a modern C compiler is that if you write the obvious loop to calculate how many bits are set in an int, the actual machine code on a brand new CPU should be a single population count instruction, C provides neither intrinsics (like Rust) not a dedicated "popcount" feature, so you can't write that but it's obviously what you want here and yup an optimising C compiler will do that.
C compilers definitely have intrinsics for this, for GCC for instance it is `__builtin_popcount`.
And apparently it has even standard language support for it since C23, it's `stdc_count_ones` [1] and in C++ you have `std::popcount` [2]
The existence of platform specific hacks is not interesting. In reality what happens is that software which has at any point cared about being portable doesn't use them.
But yes stdc_count_ones is indeed the intrinsic you'd want here, and only a few years after I stopped writing C, so thanks for mentioning that.
std::popcount is C++ but it's also kinda miserable that it took until C++ 20 and yet they still only landed the unsigned integer types, even though C++ 20 also insists the signed integers have two's complement representation, so the signed integers do have these desirable properties in fact but you can't use that.
> In reality what happens is that software which has at any point cared about being portable doesn't use them.
I don't think this generalization is actually true. Fast portable software compiles conditionally based on the target platform, picking the fast platform-specific intrinsic, and falls back to a slow but guaranteed portable software implementation. This pattern is widespread in numerical linear algebra, media codecs, data compressors, encryption, graphics, etc.
Maybe we are just quibbling over semantics but the compiler intrinsic here is '__builtin_popcount'. 'stdc_count_ones' is a standard library element that presumably will be implemented using the intrinsic.
And FWIW all major C/C++ have for a long time have had a an intrinsic for this. In clang it even has the same name, Visual Studio it's something like just '_popcount'. So it has long been easy to roll your own macro that works everywhere.
Yes, just semantics. But I don't think I can agree that because you could have ensured this works portably people actually did. That's not been my experience.
Yesterday I watched that "Sea of Thieves" C++ 14 to C++ 20 upgrade story on Youtube, that feels much more like what I've seen - code that shouldn't have worked but it did, kept alive by people whose priority is a working game.
(amomax is the atomic fetch-max instruction. lr and sc are load-reserved and store-conditional instructions; sc is like a regular store except it only succeeds if the address was not modified since the previous lr that accessed it. IOW the assembly is basically one-to-one with the C source.)
CodeWeavers, which I am very lucky to be part of, is also an (almost) 30-year-old small company! We make Wine, Proton, and CrossOver, you might've heard of them. And I think we are hiring as well, please have a look!
True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".
I (kind of) know multiple languages and they all have different word orders. I find it interesting that my brain is able to switch from expecting information to be received in one order to another. Each word order feels normal to me in its respective language, mix them and they will feel weird. It's like my brain is able to process information in different orders, but there are feature flags to enable them based on the language used.