Seems to me like one of the highest and best uses of this right now would be adding verifiable builds to … literally anything. You no longer need a verifiable-build-capable compiler or language — you can just run the compile and packaging step through a deterministic QEMU session.
Does this sound right? I’m trying to figure out where uncontrollable randomness would come in during a compile phase, and coming up blank.
Thanks for the link, I’m aware of reproducible-builds.org.
Both your causes seem trivially fixable here - the QEMU builds could have a standard system clock time they start with, and an ‘unsorted’ file listing made in a deterministic OS environment will keep the same file order, no?
By comparison the rb.org site says you need to start with stripping all that stuff out of your build process, for the reasons you refer to.
I like your thinking. Deterministic replay with QEMU is supplemental to the larger goal of reproducible builds. The communities concerned with the topic of reproducible software not only expect cohesive human-readable code that runs deterministically to produce binary reproducible results, but their originally stated goals require it.
Deterministic replay with QEMU is a "power tool" in the larger picture of these efforts.
Sounds to me like that wouldn't be quite as good as true reproducible builds (which can run on anybody's computer) because auditing the entire emulated hardware starting state and events log is a harder problem than auditing only your code. Including a virtual machine image for your build effectively makes the VM part of your codebase in terms of users needing to trust it, so verifying the build result means not just engineering but also forensics.
So it'd be good for cases where you otherwise wouldn't be able to provide any verifiability. But for software, it's still not as good as eliminating non-determinism completely.
this is a good point that it does add a lot of weight to a build process. I might try it though. If the ‘weight’ is a lightweight Linux build environment referenced by ISO hash, it might not be that bad (TM) in practice. And, generally, trading bandwidth (cheap) for human refactoring (crazy expensive) is a trade that’s often nice to make.
Does this sound right? I’m trying to figure out where uncontrollable randomness would come in during a compile phase, and coming up blank.