- Download and hash a pre-built binary, but I have to trust whoever I get the file from
- Ask someone for the hash, but I have to trust them
If the build is reproducibile then I don't need to trust the second two options, since I can check them against the first. But in that case there's no point using the second two options, since I'm building it myself anyway.
If you take that quote in context, you'll see I'm talking about a (hypothetical) cache which uses the binaries' hashes as their IDs, i.e. to fetch a binary from the cache I need to know its hash.
In this scenario the first option is useless, since there's no point using a cache if we've already built the binaries ourselves.
The second two options in this scenario have another problem: how do we identify which binary we're talking about (either to download it, or to ask for its hash)? We need some alternative way of identifying binaries, for example Nix uses a hash of their source, Debian uses a package name + version number.
Yet if we're going to use some alternative method of identification, then we might as well cut out the middle man and have the cache use that method instead of the binaries' hashes!
The important point is that the parent was claiming that reproducible builds improve performance over non-reproducible builds because of caching. Yet nothing about such a cache requires that anything be reproducible! We can make a cache like this for non-reproducible builds too. Here are the three scenarios again:
- We're doing the build ourselves. Since we're not using the cache, it doesn't matter (from a performance perspective) whether our binary's hash matches the cached binary or not.
- We're downloading a pre-built binary. Since we must identify the desired binary using something other than its hash (e.g. the hash of its source), it doesn't matter what the binary's hash is, so it doesn't need to be reproducible. Pretty much all package managers work this way, it doesn't require reproducibility.
- We're asking someone for the hash, then fetching that from the cache. Again, we must identify what we're after using something other than the hash. The only thing we need for this scenario to work is that the hash we're given matches the one in the cache. That doesn't require reproducibility, it only requires knowing the hashes of whatever files happen to be in the cache. This is what we're doing whenever we download a Linux ISO and compare its hash to one given on the distro's Web site; no reproducibility needed.
- Do the build, then hash the result
- Download and hash a pre-built binary, but I have to trust whoever I get the file from
- Ask someone for the hash, but I have to trust them
If the build is reproducibile then I don't need to trust the second two options, since I can check them against the first. But in that case there's no point using the second two options, since I'm building it myself anyway.
If you take that quote in context, you'll see I'm talking about a (hypothetical) cache which uses the binaries' hashes as their IDs, i.e. to fetch a binary from the cache I need to know its hash.
In this scenario the first option is useless, since there's no point using a cache if we've already built the binaries ourselves.
The second two options in this scenario have another problem: how do we identify which binary we're talking about (either to download it, or to ask for its hash)? We need some alternative way of identifying binaries, for example Nix uses a hash of their source, Debian uses a package name + version number.
Yet if we're going to use some alternative method of identification, then we might as well cut out the middle man and have the cache use that method instead of the binaries' hashes!
The important point is that the parent was claiming that reproducible builds improve performance over non-reproducible builds because of caching. Yet nothing about such a cache requires that anything be reproducible! We can make a cache like this for non-reproducible builds too. Here are the three scenarios again:
- We're doing the build ourselves. Since we're not using the cache, it doesn't matter (from a performance perspective) whether our binary's hash matches the cached binary or not.
- We're downloading a pre-built binary. Since we must identify the desired binary using something other than its hash (e.g. the hash of its source), it doesn't matter what the binary's hash is, so it doesn't need to be reproducible. Pretty much all package managers work this way, it doesn't require reproducibility.
- We're asking someone for the hash, then fetching that from the cache. Again, we must identify what we're after using something other than the hash. The only thing we need for this scenario to work is that the hash we're given matches the one in the cache. That doesn't require reproducibility, it only requires knowing the hashes of whatever files happen to be in the cache. This is what we're doing whenever we download a Linux ISO and compare its hash to one given on the distro's Web site; no reproducibility needed.