Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Linus on git rebase and merge (2009) (mail-archive.com)
226 points by Brajeshwar on July 30, 2013 | hide | past | favorite | 52 comments


One of his points is that you should only pull rarely:

   And, in fact, preferably you don't pull my tree at ALL, since nothing 
   in my tree should be relevant to the development work _you_ do. 
   Sometimes you have to (in order to solve some particularly nasty 
   dependency issue), but it should be a very rare and special thing, and 
   you should think very hard about it.
That might work well for the kind of highly decoupled development he's dealing with, but I'm not sure it'll work for the kind of work I do with my colleagues. The first thing I do every morning is a git pull (often surrounded by git stash [pop] and sometimes with a --rebase appended if I have outgoing commits), because I might need the stuff a colleague worked on the past days, or we might work in the same files and generate lots of conflicts otherwise. Maybe I'm doing it wrong.


Linus comments is valid in your case: your colleague's work is relevant to you, thus you have a reason to pull. Otherwise, pulling on a time base will give you little benefit besides the occasional breaking of your stuff.

If you review now Linus comment, it makes sense that pulling should be done on a feature branch: one that has a complete, working solution of something you need.


An alternative that worked really well for me is "do not push broken code". Linus is saying "If you pull at a random time, you might get cruft in your local tree". Why is that? There should be no cruft upstream from you. There may be a conflict between your code and remote, but that is up to you to resolve as the remote works and your code does not work with it.

His suggestion that in order to keep your git history pristine you should just email patches until you can get to a working state seems bizarre. git is a tool that works really well for collaborating on something until it works. In a small project, who gives a crap if you see a commit that fixes the previous commit once in a blue moon. Yes, on a large project like the Linux kernel that might get annoying, but for 99% of people it's irrelevant.


I think his comment is relevant for people who treat their "private" git history very differently from the "public" git history. My "private" git history contains lots of minor commits for stuff like checkpointing, fixing stupid errors, etc. The commit messages sometimes aren't very informative. I clean things up when I reach a good working state, and the history that becomes public is much simpler, with fewer commits and an informative message on each commit. If I share my crappy "private" history with someone else, then I forfeit my ability to clean it up later. That doesn't matter if you keep a pretty clean history in the first place, but I definitely do not.


> An alternative that worked really well for me is "do not push broken code". Linus is saying "If you pull at a random time, you might get cruft in your local tree". Why is that?

It's not about pushing broken code, it's about having a very complex code base and you just can't foresee how each change will behave before it has been tested by a lot of people on a lot of different machines.

A linux release is a pretty good point in time to get a quite well tested version and therefore it's a good base for your changes and to be able to test how your changes - only your changes - behave.


Good point. However, that still means that this advice does not apply to most people in the wild as most people are not working on projects that large.


> His suggestion that in order to keep your git history pristine you should just email patches until you can get to a working state seems bizarre.

To me, it doesn't sound meaningfully different from walking over to a coworker and arguing over the code or its results for a while. The "emailing patches" thing seems more like a slow-by-necessity feedback loop due to the fact that Linux contributors are across the world and even IMing or real-time screen-sharing doesn't make sense.


The distributed nature of the Linux kernel is exactly where imposing some structure on how patches are distributed would make sense to me. Linus resisted doing anything but deal with emailed patches for a really long time until he was called out on it. His reasoning was that he can keep up with it and keep everything straight, and the argument against him was that others couldn't.

I know that keeping a clean history is very important, but keeping all the patches straight IMHO is even more important and git is exactly the tool for that job. Email is the wrong tool for that job aside from some very occasional "check this out, am I being stupid here?" type things.

Then again, I am not a kernel developer and have no idea what I am talking about. All I can contribute is that on small-ish projects (up to 10 people on a team), using git differently makes everyone's lives easier at the cost of the history looking slightly worse.


Continuous integration, or frequently pulling, has huge advantages:

* Breakage is localized to small merges. Instead of debugging the breakage of 1 month of work integrated with yours, you debug it a day of work at a time. Localized/differential debugging is much easier.

* Ditto with conflicts. Dealing with daily conflicts is much easier than dealing with the aggregate of a month's conflict. Beyond a certain threshold of difficulty, conflict resolution quality becomes very low. People just give up and start resolving conflicts semi-randomly.


I don't think, that you're doing something particularly wrong, but for a big project like linux you have to keep the changes as isolated as possible, otherwise it will be quite hard to identify the causes, if there're problems.

So it's better to base your changes on the commit of a linux release, which has been tested quite a lot, which has some kind of defined state.


I don't think he's talking about "git pull" from a central repo, but instead a "git pull <url-to-linus'-private-repo>".

But I can also imagine it being useful for everybody to pull the code from the last release and work on it, before a single person pulls the code, merges, and commits it to the central repo.


Yeah, I think you're right. Linus isn't talking about pulling in general, just pulling from his work.


I don't think the point is that you should pull rarely. I think the key there is _my_ tree and he later refers to a branch named linus. I would suspect that he maintains his own development tree that gets pushed to the central repository and then merged to master when he feels it is ready. But prior to it getting merged, it is random and subject to his own rebasing.

I have a similar flow, I might maintain multiple feature branches but I will pull and rebase master very regularly to be sure I am working against the most recent code base. I will also rebase my work to squash commits to create a clean history prior to moving work from my feature branches to master. So if someone was to pull in one of my feature branches prior to it being merged or tagged, that tree would very likely change and then we have two versions of my work that would create a merge nightmare when we decided to get everything into master.


Do you work on topic branches?


I highly recommend a bout where you do everything on topic branches; the advantages will become sell evident over time and then you will know when and when not to use them.


Sometimes -- I usually don't pull into those unless strictly necessary.


I've been teaching myself git on side projects for several months. I've read through Linux's man page tutorial at least twice, I think I'm starting to get git. It's revolutionary. It will probably still be in use a half-century from now. It's almost too much power, I think some of the git commands should use ASCII art to paint a big picture illustrating the power of the command the user is about to invoke and double checking with the user, do you really want to "xyz", rebase, etc.


Remember, it's all immutable data structures. So whatever you do locally (i.e. apart from git push), the old version is still there - git reflog will show it.

Git eventually garbage collects old versions (i.e. not accessible from a branch), but by default only after a month.

Unfortunately... using git reflog to find the old version and undo a command itself has a learning curve. :\


This for me is the one feature of git that I could never give up. The confidence that, once a state has entered my history, I can always get it back is really helpful when playing with things.


I doubt git will be used half a century from now:

* Heuristic textual diffing is probably inferior to recorded actual diffs from our code editors

* Git merges are monolithic, and hide conflict resolution work within them in a difficult-to-review way. On the other hand, git cherry-picks do not track history properly. Future revision control will probably solve this problem better (e.g: darcs seems like it should work better, except darcs has some other [probably more important] disadvantages).



Yup, instead of pushing git core devs into adopting at least mercurial "phases" concept or thinking on another way to solve this problem, reposting "do not handle it this way" guidelines is preferred.


The main reason you don't want to rebase published code is to avoid inconveniencing other people. But that doesn't imply that code that's under heavy development and may need rebased should always be hidden from the world and kept totally private.

One workflow I've seen other projects use -- that I've also adopted for my own projects -- is to have a separate (non-private) branch called "pu" (pending updates) or "wip" (work in progress). With warnings in the developer docs that this branch may be rebased at any time.

In short, keep code that you're willing to rebase clearly delineated from the code you're not, tell other people that it may be rebased in the future, and let them decide whether the benefit of pulling the code today outweighs the drawback of potentially being inconvenienced by a rebase tomorrow.


I'm not sure to understand the point about rebase destroying the history. Does he speak specifically about `git rebase -i` ?

I very often run `git rebase master` in my feature branches to avoid having many conflicts to resolve just before my pull request to master. Once merged in master, initial commits I rebased from master did not seem to have changed. Am I missing something, here ?


In layman terms: rebase wipes out your commits and replaces them with new ones. Notice how the hashes change when you do a `git rebase master`, it's not just re-ordering them. If those commits have been published or pulled from someone else, you are destroying history.


Ok, thanks to you and dan00.

I realize it has never been a problem for me because commits replayed when I rebase are always isolated in my feature branch and not pushed anywhere.

I suppose it would be a problem if I rebased from two different branches that were already public.

Also, I often work with "dependent branch". When a change has to be made that does not fit the feature branch "theme", I create a new feature branch from master, make my change, make a PR to master and rebase the dependent branch in the feature branch. It seems it does not cause any history overwrite too.


> I'm not sure to understand the point about rebase destroying the history. Does he speak specifically about `git rebase -i` ?

When rebasing, you're at least changing a parent commit and therefore changing all of your rebased commits, because their SHA1 is also based on their parents.

> I very often run `git rebase master` in my feature branches to avoid having many conflicts to resolve just before my pull request to master. Once merged in master, initial commits I rebased from master did not seem to have changed. Am I missing something, here ?

As long as you don't rebase a feature branch after you have pushed it into a remote repository or merged it into an already published branch, you're fine.


That kind of rebase is great, because you are applying your work on top of the master branch (i.e., making the history read as it should be, that your set of commits was added on top of master). That's why git says it's "replaying" your commits on top of it. That usually works because you and master have a common previous point of reference.

If the master branch went back and changed something before that common point of reference, things get more confusing. If you want to see this yourself, checkout a separate branch, rebase interactively and make a big edit in the past. It's much more painful to add on top of that successfully with the first branch (git will want to do a merge commit) because now the history has diverged between them. (and if you use CI, you might notice that when you rebase a wip topic branch locally, git tells you things like "You are 10 commits behind and 5 commits ahead".


I did some brain dumps here the other year, hopefully you'll find this helpful:

http://gitdoctor.com/post/24762323164/rebase-demystified


You're not missing anything. Rebasing off master is the same as a fastforward only merge. It is -i that can actually change the history.


Rebasing and fast-forward merging are two very different things.

whenever you rebase you create _brand new commits_ based off another part of the project history which bear a strong resemblance to the original commits but are nonetheless different.

adding -i allows you to do further modification of commit contents (reordering, squashing, dropping, etc. but a vanilla rebase is still changing history.

Linus addresses this in his mail early on: "People can (and probably should) rebase their _private_ trees (their own work). That's a _cleanup_. But never other peoples code. That's a "destroy history" "


Are you saying when I rebase off of upstream on my private branch does that change the sha1 of the upstream commits?

That hasn't been my experience. As far as I can tell.

Edit: OK, I did a test and I can see that it is changing the sha1 for my commits on my private branch after I rebase from upstream master. So at least the way I'm doing it I'm not changing public comit hashes.


The resulting code may be the same. The resulting repository and history most certainly are not.


Linus emails archive for software engineering is like Feynman's lectures for QM. Very clear delivery of the right stuff without dumbing down.


Polite, yet still extremely useful. Is Linus getting old?


This is linus 99% of the time. As with crime and natural disasters, the 1% makes the news.


Since we have no evidence of Linus being a criminal, I think I'll have to go with Force of Nature.


The email is from 29 Mar 2009


I think this mail is an anthem of the "proper" and paradox-free TimeTravel - "never-ever-ever destroy other people's history". :)


How this applies if I want to share my branch to accept changes from other developers (think pair programming, handovers of half-done features, ...)?


It's fun to read Linus writings. It's rude and on the face. Git is exactly him, rude and on the face. Does wonders!


or just don't use git... :)


i agree, on my team we manage code by just having project_name_YYMMDD_RELEASE in a big folder /code. Works fine for us!

/s


I think parent meant using Mercurial or other DVCS that doesn't have such silly thing as "history rewriting". Leave that to politicians :-)


Mercurial will eventually have history re-writing; I say this as an outsider who has watched Mercurial slowly catch up to git by adding features that git has already had for a while. They will have to add it sooner or later to remain relevant and competitive.

The article explains why it makes sense to rewrite history - it's so as not to push garbage out on the world. One of the big niceties of distributed, disconnected repositories is that you can muck about, try things out, make mistakes, correct them, clean things up, and then push that out to a public repo. The difference with git is that you don't have to make a separate patch, roll things back (possibly restarting with a clean repo from master), then apply the patch and make a "clean" commit - you simply rebase the commits in git.

When I'm maintaining code, I don't care about every little twiddle of bits that happened; I care about discrete, human-level, bug or feature related patches, and sometimes development in progress doesn't match up with that. Being able to rewrite history means being able to have not just maintainable code, but a coherent change history.


hg already has a rebase extension http://mercurial.selenic.com/wiki/RebaseExtension


And it uses "phases" to know when something is public and should no longer be rewritten.


I find the Mercurial way improves codebase hygiene: do your dirty stuff in a clone repository. If you allow rebase, people will mess up and create ugly histories. Mercurial's lack of rebasing is a plus: it's a tool-enforced discipline of working in cloned repository and only bringing clean, compilable, executable code back into the real repository.


As a firm believer in mechanism not policy, I find that Mercurial, much like Python, tries to force its users to the "one true way", which is assumed to be the "best" and "correct" way to do something, when in reality, it's just needlessly constraining. You want to enforce policy? That's entirely possible with git. It's just not turned on by default because not everyone needs it; it's a hammer looking for nails.

And it's perfectly possible to mess up and create ugly histories without rebase; that's exactly one of the prime use cases for rebase - cleaning up ugly histories. You want me to clone a clean repo from master and try applying pieces of my patch from scratch? Screw that noise! Even with the speed of git clone, I'd much rather pick, squash and rewrite commit messages in rebase.


After having been able to use and appreciate git rebase (and especially git rebase -i) I can say I'll never go to hg if they don't have history rewriting. I'm a big boy, I can make smart decisions, Mr. DVCS.


yeah, history rewriting is something i've never needed - but then i've never had problems branching and merging which i've heard can be 'nightmarish'. i don't think it is, suck it up and get on with it and it takes 2 minutes it just looks scary up front, even before dvcs. fyi i use git at work... it is 'okay', but it has not solved any problems for me over, e.g. svn with a bunch of scripts.


Hey, I am a git lover. I am trying to say it is straight on your face to keep you cultured. Looks like it sounds different if I read it now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: