I wonder where we’d be if the idea of CPU-independent bytecode had ever really t...

mananaysiempre · on May 15, 2023

> I wonder where we’d be if the idea of CPU-independent bytecode had ever really taken off [...]. You could have an AOT compiler in the system firmware which the OS invokes [...].

You could say modern CPUs are kind of like tracing JITs. On one hand, a normal tracing JIT has much more memory to save its work than a CPU’s trace cache, but on the other, the superscalar reordering and renaming stuff is even more aggressive than a trace recorder about looking at how the code actually executes and deriving assumptions from that instead of attempting to prove them statically.

Why not AOT instead? In part because they can’t, of course—a tracing JIT requires about the least amount of heavyweight compiler tech out of all the possibilities, which is an advantage if you’re trying to fit the compiler into silicon. (That’s not to say a tracing JIT is easy—the cost of a simple compiler is that you need to make it hella fast for the result to be any good.)

But in part I suspect it’s because a standard assembly-level bytecode kind of sucks to compile ahead of time. About the most useful assumptions such a compiler can make is which things don’t interfere with others, usually memory operations, or perhaps which writes can be forwarded to reads. A tracing JIT can see some of this, a superscalar even more so; an AOT or function-at-a-time JIT, in the absence of any aliasing information or even knowing when one object ends and another begins (boo WebAssembly), can’t.

Ironically, memory segmentation as in the Intel 432 or 286 (or the IBM dinosaurs) feels like could help with that (or are we calling this idea “capability-based” once again?). Does anyone who isn’t just a speculating dilettante (unlike me) think that’s a reasonable thought?

(Wait, is a selector table just a Smalltalk-style object table with a fake moustache?)

Of course, even then we’d still have the problem that VLIW microcode wide enough to require no decoding and engage the entirety of a modern CPU’s physical register file and execution units would be cripplingly slow to fetch from DRAM, and the “legacy” ISAs partly serve a compression format.

tadfisher · on May 15, 2023

Transmeta Crusoe? They chose X86 machine code as their CPU-independent bytecode.

zdw · on May 15, 2023

In demos the Transmeta processors was shown to support multiple instruction sets - per https://en.wikipedia.org/wiki/Transmeta#Code_Morphing_Softwa... , they demoed pico-Java , and also there were rumors of PowerPC compatibility.

Although you're probably right - none of those options made it into a shipping product, only x86.

fathyb · on May 15, 2023

I guess today's equivalent VLIW chip would be Tachyum Prodigy, not super confident about it.. https://www.tachyum.com/products/#products-prodigy

hurpdurpdurp · on May 15, 2023

Nvidia's denver2 cores work this way. Shipped on an android tablet about 10 years ago. Not sure what happened to them after that.

jabl · on May 15, 2023

Mill uses something similar as well. That being said, I have <1% confidence in Mill ever moving past the slideware stage, so...

throwaway2037 · on May 15, 2023

"slideware": Hat tip. I never saw that term before. I usually see vaporware, e.g., Duke Nukem Forever.

nayuki · on May 15, 2023

> I wonder where we’d be if the idea of CPU-independent bytecode had ever really taken off [...] convert it to the actual machine code, about which the OS might know nothing, and could vary incompatibly from CPU to CPU, even among different CPU models in the same family

There are various examples I can think of.

Nvidia's CUDA platform compiles C++-like source code to PTX binary code which is GPU-independent. At run time, PTX is compiled and specialized for the specific GPU model you are running on. I can imagine that PTX is compiled differently depending on the number of registers in the GPU as well as its instruction set capabilities. https://en.wikipedia.org/wiki/Parallel_Thread_Execution

Mainstream virtual machine languages like Java, .NET, and JavaScript are obvious examples.

MBCook · on May 15, 2023

Given that everything is just microcode anyway, it would be really interesting if some (ex Intel) took their design and only switched out the instruction decode to decode ARM (or whatever) instead.

Sure it wouldn’t be perfect since the chip is optimized based on x86-64 workloads, and they’d never publish it anyway. Plus it may only be simulated instead of spending the money on manufacturing the one-offs.

But boy would it be interesting to see how it performed in various dimensions, just as an exercise.

userbinator · on May 15, 2023

Given that everything is just microcode anyway, it would be really interesting if some (ex Intel) took their design and only switched out the instruction decode to decode ARM (or whatever) instead.

You probably mean uops, but that thought has also crossed my mind in the past --- a multi-ISA CPU. They could add the decoders for other ISAs, along with extra GDT descriptor types for "ARM mode", "RISC-V mode", etc. segments like they did with V86. It's not a new idea either, https://en.wikipedia.org/wiki/NEC_V30#ISA_extensions could execute both x86 and 8080 code and of course ARM has cores with the triple-mode ARM32/Thumb/Aarch64 ISAs.

MBCook · on May 15, 2023

Yes I did, thanks. It’s also kind of reminiscent of the Transmeta Crusoe.

The problem I think multi-ISA would run into is the “master of none” issue. Intel can tune for how x86-64 works, Apple and Samsung for ARM.

But if one chip runs it all, it can’t tune for anything too specific.

It must not be worth it. I wonder if Apple would have done something like that for the M series to let it keep running Intel software. They must have tried to figure out if it was worth it right? I know they added a few instructions or an addressing mode or something to help. But they must have determined it wasn’t worth it and it could be done well enough in software.

audunw · on May 15, 2023

Exactly, they added an flag to enable total store ordering to help x86 instructions map cleanly to ARM instruction.

https://twitter.com/ErrataRob/status/1331735383193903104

Considering how fast Apple M series can emulate x86 it's clearly not worth adding much more hardware than what they have now.

rft · on May 15, 2023

Back when we were digging into microcode we found a mention of this as a PoC/toy example [1]. Sadly we never found more than an overview, would have liked to know more about it, especially how the update was accepted.

[1] https://troopers.de/events/troopers16/655_the_chimaera_proce... by https://twitter.com/cynicalsecurity

XorNot · on May 15, 2023

Seems irrelevant though - the internet exists. I'll never have a problem getting the code I need, provided it exists - i.e. if all I have to do to support ARM is use the ARM compiler, then I'll support ARM.

Docker with ARM-specific Linux distributions solves this, as does things like Golang with it's "just set an environment variable and don't even worry about needing a cross-compiler" toolchain.

skissane · on May 15, 2023

> Seems irrelevant though - the internet exists. I'll never have a problem getting the code I need, provided it exists

That assumes all code is open source, or else proprietary code shipped with source. That's not the world we live in. Most businesses run at least some closed source on-premise software. Open source is great at providing solutions to problems most people have. But when you start looking at specialised software which is highly industry-specific, suddenly open source starts to look a lot more patchy, and a closed source solution is often the only realistic option.

For example, at many engineering firms (whatever type of engineering they may be doing), you will find heaps of closed source software being used every day. For much of it, there simply is no open source solution available – or if there is, it is missing major features, or is clunky/buggy/poorly-designed, and the amount of extra cost in adopting it will be a lot more than just continuing to pay for the closed source alternative.

XorNot · on May 15, 2023

I agree with this - but my point is that I think the difficulty of compiling to alternative architectures is more of an impediment. If it's easy, then company's will just do it, give or take "we don't want to support that platform".

drpixie · on May 15, 2023

> Seems irrelevant though - the internet exists.

... but it's not always useful. MSI motherboards doing "secure boot" can't check for key revocation until after they've booted :( Sometimes you just have to rely on what you've got.

https://arstechnica.com/information-technology/2023/05/leak-...

inkyoto · on May 15, 2023

We kind of have it, except it does not seem to have taken off and has been relegated to a second-class citizen at best.

I am talking about LLVM Bitcode[0] – when the binary product (a executable or .o/.a files) is shipped in the LLVM IR representation and then is «AOT»'d into the final product (a final executable) that can, for instance, take advantage of the latest ISA features (armv9, Zen23, POWER18 or new RISC-V extensions) with zero effort on the end user part. For a while, Apple even encouraged iOS devs to upload their apps into the App Store in the Bitcode format. That has all but ceased to exist for not obvious reasons about a year or two ago. Technically, if Apple chose to transition onto an alternative ISA again (say, RISC-V), at least iOS apps would not require recompilation and would get statically converted to the new ISA at the download time.

Imagine a world where there would be a single Linux distribution for a given architecture shipped in the Bitcode format (sans the small arch specific boot area and the AOT engine), for instance.

[0] https://lowlevelbits.org/bitcode-demystified/

[1] https://www.highcaffeinecontent.com/blog/20190518-Translatin...

Findecanor · on May 15, 2023

LLVM has multiple issues that makes LLVM-Bitcode as a modern ANDF unsuitable.

It is a moving target. SPIR (OpenCL/Vulkan) used to be based on LLVM-IR, but each version had to be locked to one specific version of LLVM and that wasn't viable in the long run. So SPIR-V got its own IR, and hasn't looked back.

There are many subtle differences between architectures and their ABIs. In some ways LLVM-IR is too low-level, so the compiler has to lower to a specific ABI even before emitting LLVM-IR code.

LLVM-IR was made for a C/C++ - compiler, and retains many C-isms still. What is undefined behaviour in C is often undefined behaviour in LLVM-IR, and therefore bugs have different effects on different hardware. A large software vendor would therefore still need to keep a farm of different machines to test its code on, and that is the total opposite of what one would want to accomplish.

A truly hardware-agnostic platform would need to both have its own virtual CPU with defined semantics, and its own ABI, so as to provide an abstraction around the specifics of each hardware platform. But if it has its own ABI, then it wouldn't be 100% interoperable with existing Linux libraries on each hardware platform either.

leosarev · on May 15, 2023

I don't know about JVM, but CIL doesn't have notion of type safety, it's has been checked by compilator/verificator and doesn't enforced in runtime

SpaghettiCthulu · on May 15, 2023

Hotspot verifies the bytecode when it's loaded, but after that, it's completely unsafe too. This verification can currently be disabled, but the flag is deprecated for removal at some point.

drpixie · on May 15, 2023

(Turns to phone booth, ripping off tie) "Sounds like a job for super Forth!"

drpixie · on May 15, 2023

Seriously - something similar to the threaded interpreted code of Forth might be a nice way to implement low level byte-code on a modern machine.

immibis · on May 15, 2023

x86 is already CPU-independent bytecode.