Piper: https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-o...

majewsky · on Nov 18, 2017

Can you please quote with ">" at the start of the paragraph instead of using a code snippet? This is unreadable on mobile; the text is 3x as wide as the (scrolling) viewport.

jonhohle · on Nov 18, 2017

The number of engineers and commits is probably in the same ballpark at Amazon. With a DVCS, far less operations are done server side, so I would imagine the read requests are an order of magnitude or two lower. They have tooling to get a global view across all repos, however there are no cross-repo atomic operations (those can be managed in their build system in a relatively robust way).

What advantage does a company wide mono repo provide?

jsolson · on Nov 18, 2017

The ability to have something like CitC (or this post's git virtual filesystem) is certainly one big advantage -- no need to clone new packages, they're right there in your "local" source tree. Bazel (blaze) is another, particularly when coupled with working at HEAD.

My experience with farms of git repos is that the lack of atomic operations over many tiny repos leads to things like version sets and having to periodically merge dependencies. I've worked on teams where that was inevitably neglected during hectic periods resulting in painful merges of large numbers of changes. That problem simply doesn't exist with working at HEAD and high quality presubmit test automation/admission control. The single repo also allows for single code reviews spanning multiple packages which makes it MUCH simpler to re-arrange code (Bazel again helps here since a "package" is any directory with a BUILD file). Package creation is lighter weight for the same reason, and has fewer consequences for poor name choices since rearrangement is easy and well supported by automated tools.

Sharing one build system where a build command implicitly spans many packages also results in efficient caching of build artifacts and massively distributed builds (think a distributable and cacheable build action per executable command rather than a brazil-build per package). Each unit test result can be cached and only dependent tests re-run as you tweak an in-progress change. This is fantastic for a local workflow (flaky tests can be tackled with --runs_per_test=1000, which with a distributed build system is often only marginally slower than a single test run). Also, you can query all affected tests for a given change with a single "local" bazel query command. The list goes on from here -- I keep thinking of new things to add (finer grained dependencies, finer grained visibility controls, etc.).

It's not that you can't build most of this for distributed repos, but I'd argue it's harder and some things (like ease of code reorg) are nearly impossible.

Subjectively, having worked with both approaches at scale, Google's seems to result in much better code and repo hygiene.

ryanisnan · on Nov 18, 2017

Wow! Thanks for sharing. I hadn't contemplated how Google managed their source code. Sounds like Piper has some nice features!

guiambros · on Nov 18, 2017

Lots more details here [1] if you're interested.

[1] https://www.youtube.com/watch?v=W71BTkUbdqE

sabujp · on Nov 18, 2017

which means nothing, e.g. by loc your changestats are order of magnitude higher than mine