Or, as mentioned in the post, why they don't do a shallow clone if they have to fetch it every time for whatever reason. Seems like a weird decision either way.
Yep, a shallow clone is enough to get the latest version. And you can even filter the tree to make the download size even smaller given you only want the hash but not the contents (if the git server supports this feature)
A checkout with this can literally clone nothing but hash
git clone --depth=1 --filter=tree:0 --no-checkout https://xxxx/repo.git
cd repo
git log
People using Go modules should be using git tags, right? They should have at least one hash already that should be infinitely cacheable, the tag commit.
Of course, I have seen alleged examples of Go modules using tags like branches and force pushing them regularly, but that kind of horror sends shivers down my back, at least, and I don't understand why you'd build an ecosystem supporting that sort of nonsense and which needs to be this paranoid and do full repository clones just for caching tag contents. If anything: lock it down more by requiring tag signatures and throwing errors if a signed tag ever changes. So much of what I read about the Go module ecosystem sounds to me like they want supply chain failures.
Yeah, on third-party code hosting platforms :). And maaaaaybe in some short-lived cache somewhere. I mean, why spend on storage and complicate your life with state management, when you can keep re-requesting the same thing from the third-party source?
Joking, of course, but only a bit. There is some indication Google's proxy actually stores the clones. It just seems to mindlessly, unconditionally refresh them. Kind of like companies whose CI redownload half of NPM on every build, and cry loudly when Github goes down for a few hours - except at Google scale.
Holy cow Google! Wouldn't it behoove us to check if any changes occurred before downloading an entire repo?