What I don't understand is how they accomplish larger collaborative changes. The paper says:
"Almost all development occurs at the 'head' of the repository, not on branches."
Googler Rachel Potvin made an even stronger statement in her presentation about "The Motivation for a Monolithic Codebase" [1]:
"Branching for development at Google is exceedingly rare [..]"
In the related ACM paper she published with Josh Levenberg there is the statement that:
"Development on branches is unusual and not well supported at Google, though branches are typically used for releases."
I my world when we have to make a bigger change we create a branch and only merge it into the trunk when it is good enough to be
integrated. The branch enables us to work on that change together.
I don't understand how they do this at google. As far as I understand in their model they either have to
- give up on collaboration and always have just a single developer work on a change.
- share code by other means.
- check in unfinished work to the trunk for collaboration and constantly break trunk.
Unfinished work is not typically checked into master (and it's certainly not regularly broken).
What is more common is that very large changes are checked in as a series of individually compatible changes, and often broken up across the repository (there are of course tools to help with this). It's relatively rare for multiple developers to work on a single changelist; it's much more common to break the work into separate changelists.
Haven't worked there for some years now so I'm a bit rusty on some of the detail.
It is amazing to see how we declare something as too obvious or natural. Like trunk based development, so obvious that questioning it makes one a fool :-)
Git and its model was the best thing few years back. Now since google is doing all its dev in the main trunk/master, it must be correct and more intelligent.
Wouldn't it be a case that they went with what they had at a certain time and continue to use it as everyone is used to it and it still works? Not sure if google analysed if branching was bad and then chose trunk based development?
I cannot understand how a company that has a well defined process doing branches, is doing it wrong? or how it is so not optimal etc. I guess it is a matter of processes and culture. None of the great companies are great because their source control strategy (or code) was excellent.
We developers always over analyse everything and come up with excellent logic and some of us are gifted with words more than others.
You'll have to remember that all of those big companies have their tools and processes customized for their scale.
Example:
Instead of branching you would just create a `changelist` (a commit, a set of changes to files) and work on that.
You can show it to your colleagues. You can build and test it. You can send the id to anyone to have a look at it, or test it themselves.
You can have multiple changelists depending on each other, without being commited.
You can use git forks or whatever for development. This philosophy just says that you only push to production environment from one standard head trunk.
"Google use Perforce for their trunk (with additional tooling), and many (but not all) developers use Git on their local workstation to gain local-branching with an inhouse developed bridge for interop with Perforce.
"Branches & Merge Pain
"TL;DR: the same
"They don’t have merge pain, because as a rule developers are not merging to/from branches. At least up to the central repo’s server they are not. On workstations, developers may be merging to/from local branches, and rebasing when the push something that’s “done” back to the central repo.
"Release engineers might cherry-pick defect fixes from time to time, but regular developers are not merging (you should not count to-working-copy merges)"
I agree that feature flags can be a solution sometimes.
The presentation and the paper I linked to in my question discusses this, but they also mention large-scale refactorings and this is where I don't see how feature flags can help.
For example: How do they untangle a wad of code that is large enough that it takes longer than a few days and more than a single developer to get the code back into a state that is acceptable for trunk?
The changes required for this kind of refactorings can be all over the place, regardless of any organizational boundaries in your code. I can't see how changes of this nature can be put behind feature flags.
There’s a wealth of reading available to you if you look up “trunk-based development” as a keyword. Likewise with “continuous integration” (the actual practice, not the build tooling). Jez Humble for instance has written extensively on this.
You develop features behind compile or runtime flags and keep it off until it's ready to ship. This is what chromium.org does so that might be a more accessible way to see it in practice.
You don't have to check in unfinished work, you can break your work in parts and make people work on independent parts, every one of which moves progress forward incrementally.
Well, you could actually call that "unfinished" because in the beginning the code doesn't accomplish the task, but progressively it will become more useful.
You can when you can but you can't when you can't.
I 100% agree with you that we should work this way whenever possible and we should work hard to keep our code in a state that lets us cleanly divide work.
In my experience it is not always possible to split up work that way. Think of untangling dependencies of a larger part of the code as an example.
Sometimes you think you can't because you either haven't learned the right tricks, or because it takes more effort and thus you opt to fork off your work in a separate long standing branch in order to optimize your development velocity at the expense of possible surprising costs during merge (if other people also make your same choice).
Other times it's genuinely necessary to make a long standing branch. In those cases, you just do it. Trunk based development should not be a dogma, just a different default choice.
I only scanned through it, but it seems similar to the de facto way of doing things before distributed version control systems became popular (in the late 2000s?).
I think you mistunderstood that link is arguing for. It's basically github flow with tagging on what you release, only goes into a bit more details and discussed alternatives and suffers a "bit from too much information"
The idea is you have a constantly usable master, and your branches should be short lived so you don't hit a brick wall trying to get reviews and merge on your massive change sets.
Ultimately it means you want to test and review your change before it goes into master as opposed to creating "production", "staging" and "develop" branches, which largely just kick the can down the road and is a different way to solve that "what's deployed where" issue.
None of the other replies try to explain specifics of how this works, so let me illustrate an example of two teams collaborating to add Feature X to the monorepo without branching:
1) Team A checks in their code to provide Feature X. Their code is not used anywhere in the codebase yet, however full unit test coverage exists for the public API; this is required for code review.
2) Team B checks in their code to turn on Feature X in their product, gated under a command-line flag which by default uses the old behavior.
3) Team B checks in an integration test that flips the flag and makes sure everything works as planned.
4) If Team B requires changes to Feature X to get expected behavior, they communicate those changes to Team A and someone from either team (using available human resources) makes the changes.
5) Team B checks in a small change to flip the flag by default.
6) Team B monitors their product. If things go awry, only the very latest change is reverted and repeat (4).
7) Once stability is achieved, Team B checks in a change to remove the flag.
I've been at FB for a few years now (similar style) and this model of a monolithic repo, no branching, and simply submitting 'difs' (a list of changesets / patches) which get merged directly into master after the 'diff' is accepted and you land it seems much easier to me.
Maybe it's just because I got used to it, but now whenever I have to touch it I find the branch-based development confusing.
It might be a practical thing. I've heard from a Googler (a couple of years back) that getting changes in can take ages, and by the time the change lands, there's a good chance that there are merge conflicts, and the cycle starts over. Branches would make this even more painful.
Depends on the size of the change. Small changes are preferred in most trunk-based-dev companies.
FYI:
There are (multiple) tools in Facebook and Google which are an abstraction on top of their VCS.
(e.g. which feels more like git, where you can work on a stream of changes which depend on each other without actually pushing anything to head)
Can you clarify why branching helps collaboration? In other words, why is it harder to commit to trunk when you have several developers working on a feature?
Branching enables me to share unfinished work with my collaborators. Sometimes I don't want to commit to trunk yet but still share code and collaborate on a part of the code base.
If they're using Perforce one can "Shelve" a CL (changelist, similar to a commit in git) to make it available for others to unshelve. This can be used as a workaround, albeit limited, to share work-in-progress stuff.
That's what I meant in my question with "share code by other means". It works but in my opinion it is a large pain and I can't believe people at Google work by sending patches back and forth.
It's not like we (I am a Googler) email patch files around. Everything is integrated into the system. You create a CL (change list), it automatically gets a number. People can review it, test it, or fork it (make a new CL using your CL as a starting point) as much as they want, all from that CL number.
Think of a CL like a Pull Request that has (and can only have) a single commit.
It's visible in code review UI, has a description, has tests run on it, it can be merged by other people and it can be referenced from anywhere. Eventually it's merged into the head or dropped.
Technically yes, but people don’t use it and think about it this way. Changelists are supposed to be small, couple hundreds of line changes at most. You don’t develop a complete feature of thousands of lines in a single CL, that would be insanely hard to review. What happens is that work gets split into small chunks, and each one is submitted separately, not to feature branch, but straight to head.
This sounds identical to our workflow with git for all practical purposes. 1) New story gets a branch. 2) Branch gets squashed and rebased on most recent master before PR. PRs are generally under 1,000 lines changed. 3) PR is merged to master after code review.
It's missing a lot of what people expect in git branching, like history within the branch and arbitrary digraphs for forking and merging.
If every branch was always merged back into head before doing anything else, and always had its commits flattened into one, and someone forking off of your branch was basically opening it up, copying the changes in your clipboard, and pasting it into a new branch with no attribution or history, then sure.
No, it's more like a patch. In branch, you drag with you all the dependent changes, while with patch, you have only the actual change, and information which CL -- or PR in github terms -- is dependent.
With branches, if someone updates the branch you depend on, your work is based on stale stuff, and it can get ugly. Just try to do it on github :-)
Thank you, I appreciate your effort to help me understand this better and our exchange helped me to make progress.
One thing I infer from your answer is that it seems that there is an established process and dedicated tooling for working with patches at Google. I think a lot of my pain with patches stems more from the lack of process and lack of an agreement on formats and standards in my environment than from the use of patches per se.
Where I still see an advantage of branches is that they facilitate documentation of what has been done by whom and when. All of this documentation is in the same place and form as the documentation of changes in the trunk. It all is in commit messages whereas patches are only documented somewhere else, possibly in the Email or IM used to send the patch. Even if most of the branch documentation does not survive on trunk when we squash the final merge it is still there and easy to find as long as the branch doesn't get deleted. When I want to look up why I applied a certain patch I'll have to dig through my messages. I think that makes it harder to work with patches than with branches.
This allows you to share work without (in Git terms) pushing to master. Branches in Perforce-like systems tend to be more heavyweight and permanent (IIRC you have to branch an entire path of files, it is not the same as the Git concept of "branch" which is just a commit that points to another parent commit).
You can think of the system as enabling you, in Git terms, to create pull requests without the creation of an underlying branch.
A "patch" in Google/Facebook/Twitter is the same as a commit.
It has a (mostly) descriptive commit message, references to bug tickets and might contain links to documentation, screenshots and mocks.
You basically work on a "patch" (changelist), get feedback from others and send it out to review at the end.
Before you can submit (commit) it, you'll have to sync to "head" (to have the latest changes) and run all tests.
^ most of this happens automatically, and as most changelists ("patches") are small, this happens very fast and async in the background.
FWIW coreboot is an open-source project that uses a similar style, where you need to upload your change to the review tool (https://review.coreboot.org/, which is using gerrit https://www.gerritcodereview.com/) and people comment and LGTM in there and then it gets committed to the master branch once everything looks good.
Your changes aren't sitting on your machine. They are hosted on a server or in a git fork. After test and code review, you merge to head before deploying to production or other people make follow-on changes.
This should always be Plan A but in my experience it is not always possible. Think of untangling dependencies between a large number of components as an example.
Even git or hg branches are horrible. Once you have multiple people working on the same codebase and touching the same files it is pretty horrid to manage. I know several companies not using branches because the merge conflict resolution takes too much time.
The PDF explicitly calls out the time consuming part:
"Almost all development occurs at the “head” of the repository, not on branches. This helps identify integration problems early and minimizes the amount of merging work needed. It also makes it much easier and faster to push out security fixes."
Rebased branches in git are nice (for one developer only unfortunately), there are some pain of course but when rebase is performed often (once a day) it consumes not so much of time. The real pain begins when some huge commit is pushed to HEAD but it still manageable.
Anyway, I feel sad that so much efforts were put on really nice VCS concepts and almost no one use them in enterprise development.
Except it’s now highly scalable that is taking care of branching performance. Nothing technically making it difficult to branch afaik. In fact rapid (grape) used it pretty heavily to track rollouts if I remember correctly.
They are grouped by linking them to issues in the issue tracker. All commits will then get a link to the issue and the issue gets a link back to the commit. This way you can easily track and read the full context of old changes.
If 2+ people are working on the same file which might result into a conflict, you can either:
- handle the conflict as soon as you merge your branches somewhen in the future
or
- handle it when trying to commit your change to head
Only difference is whether you handle the conflict now, or in the future.
Git is actually pretty good at automatically resolving conflicts within files; unless you edit the same lines, it’s easy. If you do edit the same lines, merging is pretty straightforward.
This whole conversation the last day or two on HN has been kind of nuts. Like everybody agrees you shouldn’t put all your code in a single file, right? Why not? It would let everyone see all of the source code in one place! But it would be huge and hard to avoid conflicts. So we split things into files. Then “trees”, etc...
Basically it sounds like googles monorepo is really a bunch of repos glued together with changes in one triggering changes in others. The difference, it seems, is that google does not get to benefit from the things OS developers like about git. It’s like google developed custom versions of GitHub, circleci, and other tools and are marketing that as a better solution (just build several billion dollar solutions to manage your monorepo!).
And even after all that, google has a bunch of separate repos for important open source or secret work.
Both contributors created a pull request and submitted it. In the description they both state that the new value should be the one they put it. How would you resolve this issue in a timely fashion making sure you do not take down a service accidentally and do not slow down development too much. I intentionally gave you a very simple example but if you want we can go into rolling out new features, fixing security bugs and a lot more where such issues arise. And, no git will never be able to solve these issues.
I don't think that HN going nuts (except few zealots) and these problems come from the nature of software development in general. We have seen how Google solves these (monorepo, custom CI/CD, etc. etc.) and there are other companies solving it different ways (maybe have a branching model, using Github). People are just putting out here they experience and based on that and they level of understanding the perceived solutions.
What does google do? You’re working on a line of code and the trunk changes. Your local no longer aligns with it. You have a conflict.
Someone’s changes get committed first. That’s a business decision, not a code tooling one. Second pr has to adjust. Same on both mono and poly repo, just using different words.
At least branches let you have the choice, which cannot be said for branchless.
"Almost all development occurs at the 'head' of the repository, not on branches."
Googler Rachel Potvin made an even stronger statement in her presentation about "The Motivation for a Monolithic Codebase" [1]:
"Branching for development at Google is exceedingly rare [..]"
In the related ACM paper she published with Josh Levenberg there is the statement that:
"Development on branches is unusual and not well supported at Google, though branches are typically used for releases."
I my world when we have to make a bigger change we create a branch and only merge it into the trunk when it is good enough to be integrated. The branch enables us to work on that change together. I don't understand how they do this at google. As far as I understand in their model they either have to
- give up on collaboration and always have just a single developer work on a change.
- share code by other means.
- check in unfinished work to the trunk for collaboration and constantly break trunk.
[1] https://youtu.be/W71BTkUbdqE?t=904
[2] https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...