Seems to me like one of the highest and best uses of this right now would be add...

0points · 2024-08-29T10:09:09 1724926149

Historically, major causes for non-determinism was embedded timestamps and unsorted file listings created by the build tools.

I have not followed the progress recently, but https://reproducible-builds.org/ is a starting point if you are interested.

There is a sane path forward for reproducibility on bare metal, no custom emulation is needed.

vessenes · 2024-08-29T10:26:38 1724927198

Thanks for the link, I’m aware of reproducible-builds.org.

Both your causes seem trivially fixable here - the QEMU builds could have a standard system clock time they start with, and an ‘unsorted’ file listing made in a deterministic OS environment will keep the same file order, no?

By comparison the rb.org site says you need to start with stripping all that stuff out of your build process, for the reasons you refer to.

repelsteeltje · 2024-08-29T12:16:23 1724933783

> Both your causes seem trivially fixable here []

You'd be amazed about the amount of indeterminsim lurking in the guts of depencies all the way into libc and os ... Like locale, fs

commercialnix · 2024-08-29T18:20:42 1724955642

I like your thinking. Deterministic replay with QEMU is supplemental to the larger goal of reproducible builds. The communities concerned with the topic of reproducible software not only expect cohesive human-readable code that runs deterministically to produce binary reproducible results, but their originally stated goals require it.

Deterministic replay with QEMU is a "power tool" in the larger picture of these efforts.

Intralexical · 2024-08-29T19:03:12 1724958192

Sounds to me like that wouldn't be quite as good as true reproducible builds (which can run on anybody's computer) because auditing the entire emulated hardware starting state and events log is a harder problem than auditing only your code. Including a virtual machine image for your build effectively makes the VM part of your codebase in terms of users needing to trust it, so verifying the build result means not just engineering but also forensics.

So it'd be good for cases where you otherwise wouldn't be able to provide any verifiability. But for software, it's still not as good as eliminating non-determinism completely.

vessenes · 2024-08-31T12:54:33 1725108873

this is a good point that it does add a lot of weight to a build process. I might try it though. If the ‘weight’ is a lightweight Linux build environment referenced by ISO hash, it might not be that bad (TM) in practice. And, generally, trading bandwidth (cheap) for human refactoring (crazy expensive) is a trade that’s often nice to make.

Intralexical · 2024-09-03T00:27:21 1725323241

It's not so much the weight of running the build I'd be concerned about, but the weight of needing to audit the whole VM to trust the build.

E.G. There could be malicious code hidden in the free RAM.