While the explanation is right in some sense, it misses a few points.
Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).
The actual chain of commits that represent what we see as branch comes from the commits themselves. Those commits point back to their parent commit.
And then one can see why no branch has any special meaning: It is a chain of related commits with a named entrypoint. Once you delete a branch (i.e. the named wandering pointer to a commit), you cannot identify a branch as such anymore. It is just a chain of related commits without a named label now. And nothing besides the name distinguished the branch from other commit chains before.
The master/dev/release branches are then a convention to keep an updated commit pointer on the chain of commits containing changes of interest.
This was the most useful piece of information that I have ever read about Git.
But what happens if you merge branch A into beanch B? A and B will both contain the commits of A, but in B there may be commits of B between the commits that were merged. Do the same commits of A then have different parents depending on which branch they are on?
I keep repeating this every time someone talks about git and finds something weird or doesn't get branches, so I'm really glad your parent mentioned it as well and I know there's someone else out there that "gets" that:
In git it's all just labels/pointers
It's not useful at all to think about branches as the user sees them as "things" of their own. Branches don't "have" anything. Branches in that sense are just convenient labels.
Of course actual "branches" in the commit tree exist whether you label them or not. Until `git` does a garbage collection and gets rid of anything that doesn't have a pointer ultimately leading to it - something that a human would understand aka branch/tag. And that's why we call these labels "branches" as well but it's actually one word for two things here. The actual tree branch and the label that's called branch.
And a branch and a tag are basically the same exact thing underneath, just a file in the `.git` directory somewhere that contains a commit hash. All the meaning and differentiation of branch or tag is just in the human brain and how we and our tools treat them. Such as if you look at a particular commit in your tool of choice, it will tell you which branch it's part of. To create a branch you can literally just create a thousand randomly named files in the right part of the `.git` directory containing the same commit hash and suddenly this commit "is on all those branches". That's what git does and why creating a branch in git is so super fast.
To make things more complicated, the word "tag" is also overloaded. It can either be just a reference (in git lingo, a "ref") to a commit - just like a branch, only differing from it in how the tools treat it; but they can also be "annotated tags" which are pointing to a special tag object which contains some metadata and only then points to a specific commit (or other kind of object...) :)
> Do the same commits of A then have different parents depending on which branch they are on?
Absolutely not. Commits are immutable (representing whole repo state, not a diff), and branches are just (mutable) pointers to them.
As the sibling already noted, a merge commit is just a regular commit. It simply points to multiple parents, "merging" them. Aside of the whole machinery to resolve conflicts etc. that's pretty much all there is to it.
When your graph topology allows it, you can also merge branches without generating a new commit (so called "fast forward" merges) - such a merge does nothing but rewrites the branch pointer. You can also create merge commits that point to more parents than two ("octopus" merges). Reconciling the commits' content can get quite complicated in such cases, but from the repo graph perspective it's nothing special.
> Commits are immutable (representing whole repo state, not a diff)
To make things more clear: Repo state here is the contents of all files, and some metadata including a pointer to the previous commit.
So a commit hash uniquely identifies not only a set of files but the unique history leading up to it! That's why we some people like to call git the original block chain (there's no proof of work involved of course so it can never be used for payments or anything like that, but the merkle tree bit is similar enough).
If I checkout the merge commit and then do a ‘git log -n 5’, which parent pointer is followed to show the previous commit logs? A or B or Both / all if it is more than a 2-way merge?
In short: merge commits have multiple parent commits. So your tree tracing logic bifurcates at that point. The commits in the merged history are not altered by the merge commit; they each have a single parent commit (unless they are also merge commits).
A key point with git is that every clone is effectively its own set of branches; even if they have the same name. The mechanisms you use for synchronizing your local branches with some remote branches are exactly the same as the mechanisms you use between to your local named branches.
Git was actually designed initially for email based workflows where there was no central remote at all. Basically, that works by exporting patches and then applying them to your local branch. The branch name isn't even part of the patch.
A git patch is just a textualized form of the list of commits you created locally. You can apply them to any branch you like. As long as you and whomever applies the patch has a common ancestor commit in common, the patch may merge cleanly. It's good hygiene to ensure it does by for example rebasing/merging/squashing before you email somebody your patches. If that somebody is called Linus Torvalds, he's going to be pretty strict about things like commit messages and things not being spaghetti ball of merges, reverts, forks, etc. Your mess, your problem. Linux development still works via mailing list. And forget about emailing him directly with a patch; you need to use the mailing lists like everybody else. And he works with a network of senior contributors that screen everything that comes in and that aggregate all the patches coming from upstream. So, he only gets involved at the end of the process.
Of course the rest of us use network protocols to sync our repositories. But the important distinction here is that this is a two step process. First you fetch content from remote. This is simply ensuring you have all the commit objects you need in your local git database. Any branches you have are simply text files with the commit content hash they point to as the content in .git/refs/heads. Remote branches are the same but live in your local .git/refs/remotes/<remotename>. Those branches might be named something like origin/main to make it clear that that is a local branch from the origin remote. And then you rebase/merge between your local and "remote" (i.e. also local) branch as needed. Pull is just short hand for doing both steps in one go. All merges are local. Same with rebases.
Most of the conventions people project on git are kind of cultural and vary between people and companies. It's helpful to read up on the git internals in the Git book. Github is sort of an opinionated take on this that back in the day made people coming from centralized version systems like subversion feel at home by providing a central repository and allowing them to push their changes there or "share" branches there. Not necessarily a great idea for bigger projects and limiting write access is common on Github.
> A git patch is just a textualized form of the list of commits you created locally. You can apply them to any branch you like.
Not even branch. You can combine two unrelated repositories, and in theory you could cherry-pick commits from one of the original repositories to the other.
Of course, in practice this rarely works because the files mentioned in the commit don't exist in the other repository. But there's nothing in git's mechanisms that stops this: it's just a bunch of commits, which are actually just a bunch of file contents.
The whole "diff" or "patch" concept in git is just way of doing data presentation - rather than showing you the actual commit, which is generally not helpful, it shows you the difference between the contents of the commit, and the same files in the state referenced by the previous commit.
Git commits always have a parent. Applying git patches to a repository without the parent is not going to work. The repository must have the parent commit. You might force it to work without that but it's going to create conflicts, complicate merging, etc.
The reason is that a git patch is not merely a diff but an export of the actual commit objects and referred content (trees, blob diffs, hashes, etc.). Applying the patch recreates the exact commit objects on the other side. The end state is exactly the same as if you would have merged the commits from some branch. There is no difference.
Git diff, shows you a normal diff. It's not the same thing. You can indeed apply such diffs to your local work copy. But that's not the same thing as a git patch.
The commands you need are git format-patch (exports the patch) and git apply (applies it).
> Applying git patches to a repository without the parent is not going to work.
If that's the case (and I'm not saying it isn't - I really don't know), how does cherry-pick work?
I know I can cherry-pick commits in the same repo from entirely separate commit chains, where the only common ancestor is either my local HEAD or some other commit way below both of us.
Why would that not work for different repositories?
It still uses the notion of a common parent to merge a single commit. Works as long as there are no conflicts. As to why that would or would not work, I refer to the git internals section of the Git book. Great stuff. But they are not performing any dark magic here.
More generally, the branch name is not stored with commits; it has to be computed by walking back from the branch tips, and commits can have multiple possible branch names based on this. In other words, git does not preserve information about which branch was actually active when you made a particular commit. (Mercurial, by contrast, stores the branch that was active when a commit was made in the commit, so that information is preserved.) The article discusses some implications of this, although it doesn't phrase it quite the way I did above.
> Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).
A good name for these "wandering tags" would be "heads", since it's what git calls them internally (for instance, when not packed, they're stored at the "refs/heads/" path in the repository). This also exposes a distinction between a "branch" and its "head", and that distinction can be useful.
Just to make this explicit: a branch in the chain of commits. The start of the chain is pointed to by a head. When you create a new commit on a branch, the head (of that branch) changes to point to the new commit.
No. There are leaves that aren't heads (for example, after you delete a branch, the old commits just lie around until someone deliberately cleans them up), and there are heads that arent' leaves (for example if you branch a new feature branch from main. The feature branch brances off of main, so main is not a leaf anymore)
There’s no “branching a new commit” (except as syntactic sugar). Creating a branch creates a new head pointing to an arbitrary commit (even one that already has one or more heads). Committing creates a new commit and moves exactly one head forward to it.
aka Git is the simplest crappiest implementation that can work. Then they tacked a terrible UX onto it and shipped it. Chaos ensued and we as devs have spent the last almost 20 years fighting over trying to understand the chaos, we would have been way better off staying with SVN or Mercurial or Fossil(or pretty much any other VCS), but that ship has sailed and now we are stuck in the chaos.
Now nobody understands their VCS and nobody will ever understand it, as the explanations are either too detailed and miss the forest for the trees or too high level and skip over all the trees that kill you when you accidentally run into them.
I hope someone somewhere manages to convince us all to move to something sane.
SVN, branching that takes forever instead of a simple file with a commit hash in it? Are you serious?
Mercurial, when I had to use it for almost 2 years? The single thing I missed the most is the fact that "everything is just a label" (or pointer if you will).
Fossil I can't comment on with certainty, as I never really used it.
This is probably gonna get voted into oblivion, but most devs just don't get VCS, period (yes I'm a dev, so this is an inside perspective). Git or not. They didn't get it in CVS days, they didn't get it in SVN days and they didn't get various commercial ones either. Somehow most devs just don't grok version control trees.
But so far git is the single best VCS I have got to use. Everything that I actually need day to day follows from grokking the one simple rule: All those branches and tags are just labels/pointers/sticky notes and you can move them around at will and it's fast.
I do get that there are tricky situations with octopus merges and all that jazz and the Linux kernel and a few other open source projects are probably some of the trickier use cases to understand. For your run of the mill corporate situation? Keep `master`/`main`/`whateveryoucallit` history straight with one commit per feature/change by doing rebases and squashing and you will never have a single minute of misunderstanding what is going on. IFF you grok that "everything is just a label" and you've actually "created a branch" by just creating a file in `.git/refs/heads/` yourself!
And the second rule is: before you try anything, just: commit! And never close that terminal window. You might need that commit hash to reattach a label to it after you "destroyed" your branch (but git has not garbage collected yet and it's all actually still there). Or someone else still has your commits and you just get them from there and reattach a label. It's really so simple.
Look, we have spent the last ~ 20 years across HN and conferences and what not trying to teach git to people with basically nothing to show for it. Most developers still can't do much more than occasionally commit stuff. They still rm -rf their tree whenever something goes wrong.
Your entire hate for SVN was branching takes a while(because it requires a round-trip to the server). SVN was easy to reason about, you didn't have entire conference talks trying to explain how SVN works, so people don't have to rm -rf their entire tree and re-check out every week.
And even you, who claim to understand git seem to not understand git reflog. I think that clearly sums up my argument perfectly fine :)
Git is NOT easy to reason about, and I've never seen a website or blog post or conference talk about git that wasn't factually inaccurate in some way, and yet they still continue to proliferate. If we don't come up with a better git, the next decade will still be spent trying to teach git to people that will never understand it.
That is my point. The "teaching git" for most developers is not actually about git at all. What people don't understand is version control. The whole concept of a tree of branches in your repository and how they get merged into each other or branched off from each other does not change from SVN to git or vice versa.
Maybe we have different experiences, but mine is that back in my SVN days, most devs were not able to reason about branching at all. And every time they needed to check something in another branch they would `cp -a` their SVN tree to another directory and people would have like four or five copies of each of their projects with various names hanging around.
My entire "hate" for SVN is not just branching. I can add more things if you like. For example, trying to merge or commit you have to be 100% certain that you won't have conflicts or that you can solve the conflicts in one go. I have seen many many people work on something, trying to commit it and getting hopelessly stuck during the conflict resolution and destroying their work. They then redid their entire piece of work (well the files that they effed up) and sometimes started creating said `cp -a` copies of the entire tree before committing. These are the same kinds of people that `rm -rf` their tree when they run into something in git that they don't understand.
In git an easy rule to never loose work (save for actual bugs) is to: always commit first. I.e. before you try anything, just commit. Whatever goes wrong, you can always go back to your commit. I don't have conflicts often but sometimes in the middle of a conflict resolution I will just reset back to the original commit I made and try again with the knowledge I gained during the first round of resolution. Sometimes the solution is to actually just squash your own commits first and the conflict goes away for example. But I don't know that in advance coz I wasn't even expecting a conflict to begin with.
I used svn only a few years ago and don't remember branching taking very long. Maybe it was csv that had to copy the all files.
I do remember merges being horrible in svn. Just branching off trunk, doing some work and merging back is fine, but if you try to merge from trunk to your branch to "catch up" you're in for some pain when you later want to merge to trunk.
Also svn treats adding and deleting file as different from just editing, more so than git does.
SVN branching as I remember it requires a round-trip to the server, so that could be why it took so long for some people, if their server was far away.
For years I was deeply annoyed by the terrible name “branch” for something that acts more like a bookmark (or “wandering tag” indeed!).
And then I learned that git branches are branches in exactly the same way that the first element of a linked list in C “is” the linked list. Git was made by C people and they’re used to referring to entire data structures by way of some root element.
I mean that doesn’t make me dislike the name any less but at least now I see where they were coming from.
> Git was made by C people and they’re used to referring to entire data structures by way of some root element.
FWIW this is actually backwards. The word "branch" was already in common use (to refer to the same basic idea) in SCM systems going back decades, and in almost all of those a "branch" was indeed a first class object with its own data that acted as a "container" for commits, both semantically and physically.
The fact that a "branch" is just a pointer is in fact a git innovation on top of the former idea.
When the entire structure of commits is called a tree, I find the name "branch" fitting. The branch is identified by its head commit, so the path from head to root is uniquely defined and that's the branch. (Disregarding merges for now.)
Leaning into the tree metaphor (and following the precedent of other version control systems), git should have used the term trunk instead of master or main.
Why? That would heavily imply that master/main is somehow technically different from all other branches (since a trunk is certainly not a branch), which to my knowledge is not true.
I see what you mean, but don't "master" and "main" also imply some special authority? Trunk-based development with branches forking off and merging back into the trunk (a DAG, unlike a real tree! :) is an extremely common workflow.
And consider that because every folder can have its own .git subfolder, we can have several paralle trunks/repositories at the same time, meaning we have a forest.
I haven't really seen specific guidelines as to when and why should anyone start a new repo. Are there some? Or is "mono-repo" the best solution, and when and for whom. Surely we need more than one git-repo in the whole world?!
In fact, Mercurial uses the term “bookmark” for its lightweight, git-like branching. Mercurial’s branches have slightly different semantics and can’t be deleted like bookmarks or git branches
If you move the branch to point somewhere else, then it's better/more accurately said that you changed the name to refer to a different branch. We can think of branches as the chains of commits; it's the names we give them that 'wander' both as we commit and if we move them to a different branch. But merging (as it were!) the concepts of branches and names for their tips is convenient, and often equivalent/inconsequential.
Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).
The actual chain of commits that represent what we see as branch comes from the commits themselves. Those commits point back to their parent commit.
And then one can see why no branch has any special meaning: It is a chain of related commits with a named entrypoint. Once you delete a branch (i.e. the named wandering pointer to a commit), you cannot identify a branch as such anymore. It is just a chain of related commits without a named label now. And nothing besides the name distinguished the branch from other commit chains before.
The master/dev/release branches are then a convention to keep an updated commit pointer on the chain of commits containing changes of interest.