This is not a completely hashed-out thought. But I'll share it and see what others think.
My impression is that the simplest way to improve energy efficiency is to simplify hardware. Silicon is spent isolating software, etc. Time is spent copying data from kernel space to user space. Shift the burden of correctness to compilers, and use proof-carrying code to convince OSes a binary is safe. Let hardware continue managing what it's good at (e.g., out-of-order execution.) But I want a single address space with absolutely no virtualization.
Some may ask "isn't this dangerous? what if there are bugs in the verification process?" But isn't this the same as a bug in the hardware you're relying on for safety? Why is the hardware easier to get right? Isn't it cheaper to patch a software bug than a hardware bug?
A good reason why memory virtualization has not been "disrupted" yet seems to be fragmentation. Almost all low level code relies on the fact that process memory is continuous, it can be extended arbitrarily, and that data addresses cannot change (see Rust `Pin` trait). This is an illusion ensured by the MMU (aside from security).
A "software replacement for MMU" would thus need to solve fragmentation of the address space. This is something you would solve using a "heavier" runtime (e.g. every process/object needs to be able to relocate). But this may very well end up being slower than a normal MMU, just without the safety of the MMU.
> This is an illusion ensured by the MMU (aside from security).
Even in places where DMA is fully warranted, IOMMU gets shoe-horned in. I don't think there's any running away from costs to be paid for security (not the least for power-efficiency reasons).
But in this case the job of the hardware is to prevent the software from doing things, and it pays a constant overhead to do so whereas static verification as integrated into a compiler would be a one-time cost.
Arbitrarily complex programs makes even defining what is and isnt a bug arbitrarily complex
Did you want the computer to switch off at random button press; did you want two processes to swap half their memory. Maybe, maybe not
A second problem to consider is that verification is arbitrarily harder than simply running a program -- often to the extent of being impossible, even for sensible and useful functionality. This is why programs that get verified either don't allocate or do bounded allocations. But unbounded allocation is useful
It is possible to push proven or sanboxed parts across the kernel boundary. Maybe we should increase those opportunities?
Also separate address spaces simplify separate threads -- since they do not need to keep updating a single shared address space. So L1 and L2 cache should definitely give address separation. Page tables is one way to maintain that illusion for the shared resource of main memory... Probably a good thing
That's not to say there isn't a lot of space to explore your idea. It is probably an idea worth following
One final thought: verification is complex because computers are complex. Simplifying how processes interact at the hardware level. Shifts the burden of verification from arbitrarily long running and arbitrarily complex and changing software; to verifying fixed and predefined limitations on functionality. That second one has got to be the easier to verify
I like this idea, and given today's technology it feels like something that could be accomplished and rolled out in the next 30 years.
If the compiler (like rust) can prove that OOB memory is never accessed, the hardware/kernel/etc don't need to check at all anymore.
And your proof technology isn't even that scary: just compile the code yourself. If you trust the compiler and the compiler doesn't complain, you can assume the resulting binary is correct. And if a bug/0day is found, just patch and recompile.
The reality is that we do want to run code developed and compiled and delivered by entities we don't fully trust and who don't want to provide us the code or the ability to compile it ourselves. And we also want to run code that can dynamically generate other code while it's doing so - e.g. JIT compilers, embedded scripting languages, javascript in browsers, etc.
Removing these checks from the hardware is possible only if you can do without it 100% of the time; if you can trust that 99% of the binaries executed, that's not enough, you still need this 'enforced sandboxing' functionality.
Perhaps instead of distributing program executables, we can distribute program intermediate representations and then lazily invoke the OS's trusted compiler to do the final translation to binary. Someone suggested a Vale-based OS along these lines, it was an interesting notion.
I do not believe such OSes can ever be secure given how often vulnerabilities are found in web browsers's JS engines alone. Besides, AFAIK the only effective mitigation against all Spectre variants is using separate address spaces.
My understanding is that's more or less what Microsoft was looking at in their Midori operating system. They weren't explicitly looking to get rid of the CPU's protection rings, but ran everything ring 0 and relied on their .NET verification for protection.
eBPF does this, but its power is very limited and it has significant issues with isolation in a multi-tenant environment (like in a true multi-user OS). Beyond this one experiment, proof-carrying code is never going to happen on a larger scale: holier-than-thou kernel developers are deathly allergic to anything threatening their hardcore-C-hacker-supremacy and application developers are now using Go, a language so stupid and backwards it's analog to sprinting full speed in the opposite direction of safety and correctness.
My impression is that the simplest way to improve energy efficiency is to simplify hardware. Silicon is spent isolating software, etc. Time is spent copying data from kernel space to user space. Shift the burden of correctness to compilers, and use proof-carrying code to convince OSes a binary is safe. Let hardware continue managing what it's good at (e.g., out-of-order execution.) But I want a single address space with absolutely no virtualization.
Some may ask "isn't this dangerous? what if there are bugs in the verification process?" But isn't this the same as a bug in the hardware you're relying on for safety? Why is the hardware easier to get right? Isn't it cheaper to patch a software bug than a hardware bug?