Git Branches: Intuition and Reality

riperoni · on Nov 23, 2023

While the explanation is right in some sense, it misses a few points.

Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).

The actual chain of commits that represent what we see as branch comes from the commits themselves. Those commits point back to their parent commit.

And then one can see why no branch has any special meaning: It is a chain of related commits with a named entrypoint. Once you delete a branch (i.e. the named wandering pointer to a commit), you cannot identify a branch as such anymore. It is just a chain of related commits without a named label now. And nothing besides the name distinguished the branch from other commit chains before.

The master/dev/release branches are then a convention to keep an updated commit pointer on the chain of commits containing changes of interest.

jansan · on Nov 23, 2023

This was the most useful piece of information that I have ever read about Git.

But what happens if you merge branch A into beanch B? A and B will both contain the commits of A, but in B there may be commits of B between the commits that were merged. Do the same commits of A then have different parents depending on which branch they are on?

tharkun__ · on Nov 23, 2023

I keep repeating this every time someone talks about git and finds something weird or doesn't get branches, so I'm really glad your parent mentioned it as well and I know there's someone else out there that "gets" that:

    In git it's all just labels/pointers

It's not useful at all to think about branches as the user sees them as "things" of their own. Branches don't "have" anything. Branches in that sense are just convenient labels.

Of course actual "branches" in the commit tree exist whether you label them or not. Until `git` does a garbage collection and gets rid of anything that doesn't have a pointer ultimately leading to it - something that a human would understand aka branch/tag. And that's why we call these labels "branches" as well but it's actually one word for two things here. The actual tree branch and the label that's called branch.

And a branch and a tag are basically the same exact thing underneath, just a file in the `.git` directory somewhere that contains a commit hash. All the meaning and differentiation of branch or tag is just in the human brain and how we and our tools treat them. Such as if you look at a particular commit in your tool of choice, it will tell you which branch it's part of. To create a branch you can literally just create a thousand randomly named files in the right part of the `.git` directory containing the same commit hash and suddenly this commit "is on all those branches". That's what git does and why creating a branch in git is so super fast.

seba_dos1 · on Nov 23, 2023

To make things more complicated, the word "tag" is also overloaded. It can either be just a reference (in git lingo, a "ref") to a commit - just like a branch, only differing from it in how the tools treat it; but they can also be "annotated tags" which are pointing to a special tag object which contains some metadata and only then points to a specific commit (or other kind of object...) :)

evntdrvn · on Nov 23, 2023

You’ll also see the first type referred to as “lightweight tags”, if that helps anyone :)

paulddraper · on Nov 24, 2023

Yeah they should have called them tags and annotations.

Or labels and annotations.

Or something else.

seba_dos1 · on Nov 23, 2023

> Do the same commits of A then have different parents depending on which branch they are on?

Absolutely not. Commits are immutable (representing whole repo state, not a diff), and branches are just (mutable) pointers to them.

As the sibling already noted, a merge commit is just a regular commit. It simply points to multiple parents, "merging" them. Aside of the whole machinery to resolve conflicts etc. that's pretty much all there is to it.

When your graph topology allows it, you can also merge branches without generating a new commit (so called "fast forward" merges) - such a merge does nothing but rewrites the branch pointer. You can also create merge commits that point to more parents than two ("octopus" merges). Reconciling the commits' content can get quite complicated in such cases, but from the repo graph perspective it's nothing special.

xorcist · on Nov 23, 2023

> Commits are immutable (representing whole repo state, not a diff)

To make things more clear: Repo state here is the contents of all files, and some metadata including a pointer to the previous commit.

So a commit hash uniquely identifies not only a set of files but the unique history leading up to it! That's why we some people like to call git the original block chain (there's no proof of work involved of course so it can never be used for payments or anything like that, but the merkle tree bit is similar enough).

paulddraper · on Nov 23, 2023

Merging branch A into branch B does two things:

1. Create a new merge commit with two parents: the commit pointed to by A and the commit pointed to by B.

2. Set branch B to point at the new merge commit.

This is a non-linear history; when comparing some commits there isn't a "before" or "after."

hackernews1134 · on Nov 24, 2023

If I checkout the merge commit and then do a ‘git log -n 5’, which parent pointer is followed to show the previous commit logs? A or B or Both / all if it is more than a 2-way merge?

paulddraper · on Nov 24, 2023

By default it's by commit date.

Other ordering is possible, e.g. --topo-order

baronswindle · on Nov 24, 2023

As I recall, it shows commits from both parents in order of the committed date.

emilecantin · on Nov 24, 2023

To complete the answer, the only difference if you merge branch B into branch A is that A is advanced to the new commit instead.

loeg · on Nov 23, 2023

In short: merge commits have multiple parent commits. So your tree tracing logic bifurcates at that point. The commits in the merged history are not altered by the merge commit; they each have a single parent commit (unless they are also merge commits).

jillesvangurp · on Nov 23, 2023

A key point with git is that every clone is effectively its own set of branches; even if they have the same name. The mechanisms you use for synchronizing your local branches with some remote branches are exactly the same as the mechanisms you use between to your local named branches.

Git was actually designed initially for email based workflows where there was no central remote at all. Basically, that works by exporting patches and then applying them to your local branch. The branch name isn't even part of the patch.

A git patch is just a textualized form of the list of commits you created locally. You can apply them to any branch you like. As long as you and whomever applies the patch has a common ancestor commit in common, the patch may merge cleanly. It's good hygiene to ensure it does by for example rebasing/merging/squashing before you email somebody your patches. If that somebody is called Linus Torvalds, he's going to be pretty strict about things like commit messages and things not being spaghetti ball of merges, reverts, forks, etc. Your mess, your problem. Linux development still works via mailing list. And forget about emailing him directly with a patch; you need to use the mailing lists like everybody else. And he works with a network of senior contributors that screen everything that comes in and that aggregate all the patches coming from upstream. So, he only gets involved at the end of the process.

Of course the rest of us use network protocols to sync our repositories. But the important distinction here is that this is a two step process. First you fetch content from remote. This is simply ensuring you have all the commit objects you need in your local git database. Any branches you have are simply text files with the commit content hash they point to as the content in .git/refs/heads. Remote branches are the same but live in your local .git/refs/remotes/<remotename>. Those branches might be named something like origin/main to make it clear that that is a local branch from the origin remote. And then you rebase/merge between your local and "remote" (i.e. also local) branch as needed. Pull is just short hand for doing both steps in one go. All merges are local. Same with rebases.

Most of the conventions people project on git are kind of cultural and vary between people and companies. It's helpful to read up on the git internals in the Git book. Github is sort of an opinionated take on this that back in the day made people coming from centralized version systems like subversion feel at home by providing a central repository and allowing them to push their changes there or "share" branches there. Not necessarily a great idea for bigger projects and limiting write access is common on Github.

PaulDavisThe1st · on Nov 23, 2023

> A git patch is just a textualized form of the list of commits you created locally. You can apply them to any branch you like.

Not even branch. You can combine two unrelated repositories, and in theory you could cherry-pick commits from one of the original repositories to the other.

Of course, in practice this rarely works because the files mentioned in the commit don't exist in the other repository. But there's nothing in git's mechanisms that stops this: it's just a bunch of commits, which are actually just a bunch of file contents.

The whole "diff" or "patch" concept in git is just way of doing data presentation - rather than showing you the actual commit, which is generally not helpful, it shows you the difference between the contents of the commit, and the same files in the state referenced by the previous commit.

jillesvangurp · on Nov 24, 2023

Git commits always have a parent. Applying git patches to a repository without the parent is not going to work. The repository must have the parent commit. You might force it to work without that but it's going to create conflicts, complicate merging, etc.

The reason is that a git patch is not merely a diff but an export of the actual commit objects and referred content (trees, blob diffs, hashes, etc.). Applying the patch recreates the exact commit objects on the other side. The end state is exactly the same as if you would have merged the commits from some branch. There is no difference.

Git diff, shows you a normal diff. It's not the same thing. You can indeed apply such diffs to your local work copy. But that's not the same thing as a git patch.

The commands you need are git format-patch (exports the patch) and git apply (applies it).

cassianoleal · on Nov 24, 2023

> Applying git patches to a repository without the parent is not going to work.

If that's the case (and I'm not saying it isn't - I really don't know), how does cherry-pick work?

I know I can cherry-pick commits in the same repo from entirely separate commit chains, where the only common ancestor is either my local HEAD or some other commit way below both of us.

Why would that not work for different repositories?

jillesvangurp · on Nov 25, 2023

It still uses the notion of a common parent to merge a single commit. Works as long as there are no conflicts. As to why that would or would not work, I refer to the git internals section of the Git book. Great stuff. But they are not performing any dark magic here.

cassianoleal · on Nov 28, 2023

So are you saying that if I have completely disconnected commit trees in the same repo, git will refuse to cherry-pick between each other?

pdonis · on Nov 23, 2023

> The branch name isn't even part of the patch.

More generally, the branch name is not stored with commits; it has to be computed by walking back from the branch tips, and commits can have multiple possible branch names based on this. In other words, git does not preserve information about which branch was actually active when you made a particular commit. (Mercurial, by contrast, stores the branch that was active when a commit was made in the commit, so that information is preserved.) The article discusses some implications of this, although it doesn't phrase it quite the way I did above.

loeg · on Nov 23, 2023

I think this is covered adequately (if less completely) in the "technically correct" definition section.

cesarb · on Nov 23, 2023

> Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).

A good name for these "wandering tags" would be "heads", since it's what git calls them internally (for instance, when not packed, they're stored at the "refs/heads/" path in the repository). This also exposes a distinction between a "branch" and its "head", and that distinction can be useful.

lisper · on Nov 23, 2023

Just to make this explicit: a branch in the chain of commits. The start of the chain is pointed to by a head. When you create a new commit on a branch, the head (of that branch) changes to point to the new commit.

lisper · on Nov 24, 2023

Arrggh. A branch IS a chain of commits.

galaxyLogic · on Nov 24, 2023

So would it be correct to say that 'heads' are the LEAVES of the commits-tree?

Or are there such leaves which are not considered 'heads'?

Lornedon · on Nov 24, 2023

No. There are leaves that aren't heads (for example, after you delete a branch, the old commits just lie around until someone deliberately cleans them up), and there are heads that arent' leaves (for example if you branch a new feature branch from main. The feature branch brances off of main, so main is not a leaf anymore)

galaxyLogic · on Nov 24, 2023

> The feature branch brances off of main, so main is not a leaf anymore)

But "main" remains a "head" even though the feature branch "continues the graph" from it?

So there must be a difference between "Adding a new commit" and "Branching a new commit". I think I got it now. Thanks

dasil003 · on Nov 25, 2023

There’s no “branching a new commit” (except as syntactic sugar). Creating a branch creates a new head pointing to an arbitrary commit (even one that already has one or more heads). Committing creates a new commit and moves exactly one head forward to it.

zie · on Nov 24, 2023

aka Git is the simplest crappiest implementation that can work. Then they tacked a terrible UX onto it and shipped it. Chaos ensued and we as devs have spent the last almost 20 years fighting over trying to understand the chaos, we would have been way better off staying with SVN or Mercurial or Fossil(or pretty much any other VCS), but that ship has sailed and now we are stuck in the chaos.

Now nobody understands their VCS and nobody will ever understand it, as the explanations are either too detailed and miss the forest for the trees or too high level and skip over all the trees that kill you when you accidentally run into them.

I hope someone somewhere manages to convince us all to move to something sane.

tharkun__ · on Nov 24, 2023

With all due respect, what a load of crap!

SVN, branching that takes forever instead of a simple file with a commit hash in it? Are you serious?

Mercurial, when I had to use it for almost 2 years? The single thing I missed the most is the fact that "everything is just a label" (or pointer if you will).

Fossil I can't comment on with certainty, as I never really used it.

This is probably gonna get voted into oblivion, but most devs just don't get VCS, period (yes I'm a dev, so this is an inside perspective). Git or not. They didn't get it in CVS days, they didn't get it in SVN days and they didn't get various commercial ones either. Somehow most devs just don't grok version control trees.

But so far git is the single best VCS I have got to use. Everything that I actually need day to day follows from grokking the one simple rule: All those branches and tags are just labels/pointers/sticky notes and you can move them around at will and it's fast.

I do get that there are tricky situations with octopus merges and all that jazz and the Linux kernel and a few other open source projects are probably some of the trickier use cases to understand. For your run of the mill corporate situation? Keep `master`/`main`/`whateveryoucallit` history straight with one commit per feature/change by doing rebases and squashing and you will never have a single minute of misunderstanding what is going on. IFF you grok that "everything is just a label" and you've actually "created a branch" by just creating a file in `.git/refs/heads/` yourself!

And the second rule is: before you try anything, just: commit! And never close that terminal window. You might need that commit hash to reattach a label to it after you "destroyed" your branch (but git has not garbage collected yet and it's all actually still there). Or someone else still has your commits and you just get them from there and reattach a label. It's really so simple.

zie · on Nov 24, 2023

Look, we have spent the last ~ 20 years across HN and conferences and what not trying to teach git to people with basically nothing to show for it. Most developers still can't do much more than occasionally commit stuff. They still rm -rf their tree whenever something goes wrong.

Your entire hate for SVN was branching takes a while(because it requires a round-trip to the server). SVN was easy to reason about, you didn't have entire conference talks trying to explain how SVN works, so people don't have to rm -rf their entire tree and re-check out every week.

And even you, who claim to understand git seem to not understand git reflog. I think that clearly sums up my argument perfectly fine :)

Git is NOT easy to reason about, and I've never seen a website or blog post or conference talk about git that wasn't factually inaccurate in some way, and yet they still continue to proliferate. If we don't come up with a better git, the next decade will still be spent trying to teach git to people that will never understand it.

tharkun__ · on Nov 24, 2023

That is my point. The "teaching git" for most developers is not actually about git at all. What people don't understand is version control. The whole concept of a tree of branches in your repository and how they get merged into each other or branched off from each other does not change from SVN to git or vice versa.

Maybe we have different experiences, but mine is that back in my SVN days, most devs were not able to reason about branching at all. And every time they needed to check something in another branch they would `cp -a` their SVN tree to another directory and people would have like four or five copies of each of their projects with various names hanging around.

My entire "hate" for SVN is not just branching. I can add more things if you like. For example, trying to merge or commit you have to be 100% certain that you won't have conflicts or that you can solve the conflicts in one go. I have seen many many people work on something, trying to commit it and getting hopelessly stuck during the conflict resolution and destroying their work. They then redid their entire piece of work (well the files that they effed up) and sometimes started creating said `cp -a` copies of the entire tree before committing. These are the same kinds of people that `rm -rf` their tree when they run into something in git that they don't understand.

In git an easy rule to never loose work (save for actual bugs) is to: always commit first. I.e. before you try anything, just commit. Whatever goes wrong, you can always go back to your commit. I don't have conflicts often but sometimes in the middle of a conflict resolution I will just reset back to the original commit I made and try again with the knowledge I gained during the first round of resolution. Sometimes the solution is to actually just squash your own commits first and the conflict goes away for example. But I don't know that in advance coz I wasn't even expecting a conflict to begin with.

zie · on Nov 25, 2023

I think we have definitely had different experiences.

TorKlingberg · on Nov 24, 2023

I used svn only a few years ago and don't remember branching taking very long. Maybe it was csv that had to copy the all files.

I do remember merges being horrible in svn. Just branching off trunk, doing some work and merging back is fine, but if you try to merge from trunk to your branch to "catch up" you're in for some pain when you later want to merge to trunk.

Also svn treats adding and deleting file as different from just editing, more so than git does.

zie · on Nov 24, 2023

SVN branching as I remember it requires a round-trip to the server, so that could be why it took so long for some people, if their server was far away.

gpderetta · on Nov 24, 2023

> never close that terminal window. You might need that commit hash to reattach a label to it after you "destroyed" your branch

That's where the reflog comes handy.

skrebbel · on Nov 23, 2023

For years I was deeply annoyed by the terrible name “branch” for something that acts more like a bookmark (or “wandering tag” indeed!).

And then I learned that git branches are branches in exactly the same way that the first element of a linked list in C “is” the linked list. Git was made by C people and they’re used to referring to entire data structures by way of some root element.

I mean that doesn’t make me dislike the name any less but at least now I see where they were coming from.

ajross · on Nov 23, 2023

> Git was made by C people and they’re used to referring to entire data structures by way of some root element.

FWIW this is actually backwards. The word "branch" was already in common use (to refer to the same basic idea) in SCM systems going back decades, and in almost all of those a "branch" was indeed a first class object with its own data that acted as a "container" for commits, both semantically and physically.

The fact that a "branch" is just a pointer is in fact a git innovation on top of the former idea.

mr_mitm · on Nov 23, 2023

When the entire structure of commits is called a tree, I find the name "branch" fitting. The branch is identified by its head commit, so the path from head to root is uniquely defined and that's the branch. (Disregarding merges for now.)

TeMPOraL · on Nov 23, 2023

> Disregarding merges for now.

Without disregarding them, it's not a tree, but a DAG.

dayjaby · on Nov 23, 2023

It is a tree. What makes you think it's just a DAG? Are there commits with multiple parent commits or what?

zaphar · on Nov 23, 2023

There absolutely can be. Merge commits have multiple parent commits for example. It's definitely a graph not just a tree.

dayjaby · on Nov 23, 2023

Parent comment was about disregarding merge commits.

zaphar · on Nov 24, 2023

You can't disregard merge commits. It's part of the structure.

imron · on Nov 23, 2023

Yes. Merge commits have two parents.

Izkata · on Nov 23, 2023

Two or more. I'm not sure there's a limit.

Try not to do this (imagine a 5-way merge conflict).

skrebbel · on Nov 23, 2023

FWIW “tree” has a specific, different meaning in Git. It’s a file tracking the contents of a directory.

cpeterso · on Nov 23, 2023

Leaning into the tree metaphor (and following the precedent of other version control systems), git should have used the term trunk instead of master or main.

hnarn · on Nov 23, 2023

Why? That would heavily imply that master/main is somehow technically different from all other branches (since a trunk is certainly not a branch), which to my knowledge is not true.

cpeterso · on Nov 24, 2023

I see what you mean, but don't "master" and "main" also imply some special authority? Trunk-based development with branches forking off and merging back into the trunk (a DAG, unlike a real tree! :) is an extremely common workflow.

https://trunkbaseddevelopment.com/

Kuraj · on Nov 24, 2023

By following this metaphor, a trunk would be the original commit, whereas branches are the tree branch tips.

galaxyLogic · on Nov 24, 2023

And consider that because every folder can have its own .git subfolder, we can have several paralle trunks/repositories at the same time, meaning we have a forest.

I haven't really seen specific guidelines as to when and why should anyone start a new repo. Are there some? Or is "mono-repo" the best solution, and when and for whom. Surely we need more than one git-repo in the whole world?!

lmm · on Nov 24, 2023

Well yes if you disregard the thing that makes it not a tree then it is a tree. But you can't disregard that!

imron · on Nov 23, 2023

> Git was made by C people

This is why I think of branches as pointers. The file contents are literally just a pointer to a commit on the DAG.

neuromanser · on Nov 23, 2023

That's (most probably) where the "head" terminology comes from, too.

alfredpawney · on Nov 23, 2023

Yes you are correct. It traces back to Allen Newell

cpeterso · on Nov 23, 2023

> acts more like a bookmark

In fact, Mercurial uses the term “bookmark” for its lightweight, git-like branching. Mercurial’s branches have slightly different semantics and can’t be deleted like bookmarks or git branches

OJFord · on Nov 24, 2023

I don't see the problem?

Real natural tree branches grow.

If you move the branch to point somewhere else, then it's better/more accurately said that you changed the name to refer to a different branch. We can think of branches as the chains of commits; it's the names we give them that 'wander' both as we commit and if we move them to a different branch. But merging (as it were!) the concepts of branches and names for their tips is convenient, and often equivalent/inconsequential.

mtnygard · on Nov 23, 2023

I have found that git makes a lot more sense if you reverse the mental model of lineage. People think about a lineage going forward. But a more useful way to think is in terms of backward pointers.

A commit points to it's parent(s). Since a branch is just a commit ID, you can follow the parent links backwards to find the whole history of that branch.

So a "branch point" is just where two chains of parent links converge.

The special part are merge commits. Those have multiple parents, indicating that two histories fused into one.

layer8 · on Nov 23, 2023

The issue is that if you consider a branch to be what is really the history of the branch tip, then a branch is not just the part starting from the last join with another branch. Instead it is some directed path through the commit DAG, a path that in general can’t be reconstructed from the information Git keeps.

If, for example, you have a structure like

        |
        o
       / \
      o   o
   A  |   |  B
      o   o
       \ /        
        o
       / \
      o   o
   C  |   |  D
      o   o
       \ /
        o
        |

then conceptually the path CA might be one branch and DB the other branch (or alternatively, CB and DA). But this is not something that is represented in Git’s model.

ajross · on Nov 23, 2023

> a path that in general can’t be reconstructed from the information Git keeps.

Uh... yes it can. Commits have a list of 0 or more parents. That creates a DAG. There are literal hordes of tools out there that reliably interpret this, from visualizer tools to practical mutators like git bisect.

Maybe you're trying to say that no single commit order exists that traverses the whole tree. That's true, because branches can merge together. But it remains a completely interpretable graph nonetheless.

layer8 · on Nov 23, 2023

That’s not what I was saying. I was referring to the history of branch tips.

ajross · on Nov 23, 2023

But that's not related to the DAG at all. The branch can be changed at any moment for any reason to point to any commit with any content.

But it's true that conventionally, a new branch tip should always have the previous branch tip as an ancestor. But not always as a direct parent, and even if so it might be a merge commit that joins two different branches. There is indeed no single spanning path through a DAG.

But trying to explain it as "git doesn't store enough information" to construct that spanning path seems confused to me. It's not about what git stores, it's just math: there is no such path in the general case, period.

layer8 · on Nov 23, 2023

The fact that the branch tip can be moved to unrelated commits is another issue with Git’s model, and a mismatch to the intuitive “a named lineage in the DAG” conception of branches. In other VCSs, that would be a new/different branch, and you could still rename branches so that the same name will later refer to a different branch, but the branch history as such (including renames) would be preserved.

ajross · on Nov 23, 2023

> mismatch to the intuitive “a named lineage in the DAG” conception of branches

Once more, that conception may be intuitive but it is wrong. A branch is emphatically NOT a line through the DAG, it's the whole DAG. There simply is no single list of patches to apply to get from one commit to another, even if both were at some point heads of the same branch, and even if one is an ancestor of the other.

And the reason it's wrong is that branches can merge together. You can have commit A descended from both the "main" branch and the "topic_a" branch, despite the fact that those two had diverged. This isn't a bug, it's a feature. You don't have to use it if you don't want to (lots of projects require linear commit histories in their main branch), but it's part of the tool nonetheless because some projects (Linux especially) use it heavily and to great effect.

layer8 · on Nov 23, 2023

I don’t see what is wrong. Branches in that conception are paths through the DAG. One would like to annotate such paths with names, and have those names automatically apply by default when adding a commit to the end of such a path.

This has nothing to do with lists of patches. Nevertheless, for any given path, it is always possible to compute a list of patches that would match that path. Just compute the diffs between all adjacent pairs of commits on that path. What you maybe mean is that you can’t replay just a single path and have it result in the same commit hash. That, of course, is correct, you need to replay the complete prefix DAG. However, I don’t see why you think that causes issues for the branches-as-paths conception.

Yes, branches can merge together. That just means that different paths can share nodes and edges. Just like two hiking trails can partially overlap. Again, I see no problem here.

ajross · on Nov 24, 2023

No! No, they are not. This is a mistake you are making, and I'm trying (vainly, maybe) to correct it.

If you have a branched structure like that, and dump each commit as a patch, and try to trace a "path through the DAG" by applying those patches, you will find that they don't apply after the first merge commit. There is no single list of patches, because that's not the structure of the history. The merge commit "patch" must be a different delta depending on where you came from.

lmm · on Nov 24, 2023

The problem is that the thing that Git calls a branch does not identify a specific path. Look at OP's example again: if you conceptualize branches as paths, then CA/DB and CB/DA are distinct ways to divide that graph into branches. But in Git there is no way to represent that distinction (at least, not as branches).

vifon · on Nov 23, 2023

This missing piece of information would be essentially `git reflog`, except it's not something Git sends between the clones.

Izkata · on Nov 23, 2023

You can reconstruct it manually with a combination of the parent commit order and the automatic merge commit message, if you didn't change the commit message. But yeah, that second part isn't recorded in the structure itself.

lifeisstillgood · on Nov 23, 2023

Just to go off on a tangent - that's a pretty neat diagram for a throw away comment. was that just careful spacing in the HN textbox or did you use a tool - which one ? :-)

grodriguez100 · on Nov 23, 2023

“Text after a blank line that is indented by two or more spaces is reproduced verbatim. (This is intended for code.)”

Looks like this also switches to a monospaced font, which makes it easier to draw ASCII art.

  This should be rendered using a monospaced font. 
  _____
  \   /
   \ /
    O

mountainboy · on Nov 23, 2023

Interesting, so then which path(s) does git display when running git-log on this?

seba_dos1 · on Nov 23, 2023

Define "this". If you git-log from the commit on the top of that ASCII graph, you get all the drawn commits listed (unless adjusted with arguments such as `--no-merges` or `--first-parent`).

Izkata · on Nov 23, 2023

You can get ASCII art of that structure with:

  git log --graph --oneline

Older versions you'll also want --decorate to show branches and tags, but I think that's on by default now.

trealira · on Nov 23, 2023

That's how I learned it, not having known anything about git or version control beforehand. I used this site:

learngitbranching.js.org/

Which represents commits as circles with arrows pointing to their parents.

MauranKilom · on Nov 23, 2023

I don't use git at work, but in my private hobby projects my friends usually get mad when they watch me juggle changes and branch pointers with git reset --hard and git stash...

How do you undo a merge that you didn't mean to do/did wrongly?

    git reset --hard <last commit before merge>

Have some cosmetic fixups on your local branch that really should go into main (or a separate branch) first before merging a bigger feature?

    git stash
    git checkout main
    git stash apply

By thinking about branches as pointers, the commit graph existing independently, and stashes just being temporary commits, I feel I'm working much more directly with the underlying abstraction. Yes, git has commands for specific combinations of actions, but for an occasional user it's harder to remember every such command and which arguments and flags to pass in which order. It's either "look through documentation until you find graph diagrams illustrating what will happen for this order of arguments and flags" or "use the primitives 'move branch pointer', 'commit to branch', 'hold these changes for a second' for obtaining the commit tree you actually want. Knowing that the reflog exists also makes this insane-sounding working mode pretty non-scary. And yes, some operations (e.g. cherry-pick) you just need to do the "real" way.

(My git stash obsession is most likely just damage from years of using Perforce, which doesn't have a modified/staged distinction. The only way to commit only part of a changed file is via the equivalent of stash -> [restore half the file] -> commit -> stash pop.)

Prepares to be crucified...

globular-toast · on Nov 23, 2023

"Undo" is usually more like `git reset --hard HEAD@{1}`, ie. using the reflog.

Nothing wrong with this at all. Only people who don't understand and/or are scared of git don't like it.

You could also use cherry-pick to "donate" commits to other branches, instead of stash, of course. Magit has some great extra abstractions for this.

tremon · on Nov 23, 2023

git reset --hard is actually dangerous, because it throws away local modifications that were not yet committed. To undo just the commit and not the work, you should use git reset --soft (to undo just the git commit) or git reset --mixed (to undo both the git commit and the "git add"s leading up to the commit).

alex_smart · on Nov 23, 2023

git checkout will also happily throw away local unstaged modifications, and I would argue that it is even more dangerous because I did not have to type "--hard" to shoot myself in the foot.

chronial · on Nov 24, 2023

That's the reason why it was replaced by two separate, more sensible commands: git switch for switching branches etc, which is safe, and the inherently dangerous git restore for reverting changes in your working directory.

alex_smart · on Nov 24, 2023

It wasn't replaced, the other two commands were added to the cli.

It will take decades before git checkout will actually get replaced by git switch/restore in all the git books, tutorials and search results. Most normal users will keep learning and using git checkout in the meanwhile.

I understand why they can't actually deprecate existing commands. The git command is used in far too many existing shell scripts across thousands of companies.

I would argue that this is actually a fundamental deficiency of the "unix way" of doing things, where the same command is meant to be used both by human beings and in automated workflows. Automated workflows require backwards compatibility. Humans need easy to use interfaces that don't allow them to easily shoot themselves in the foot. The same tool cannot serve both needs.

hnarn · on Nov 23, 2023

That does not sound right at all, I’m pretty sure there’s a warning when you try to checkout a branch that would override local unstaged changes. I might be wrong but I’d like some proof.

chlorion · on Nov 24, 2023

In some repo you have, make some changes to files that are tracked. Git status will show you "changes not staged for commit: <more stuff>". Run "git checkout .", and check the status command again, it will be in a clean state.

>git status

    On branch feature/redacted
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
 modified:   redacted.py

    no changes added to commit (use "git add" and/or "git commit -a")

>git checkout .

    Updated 1 path from the index

>git status

    On branch feature/redacted
    nothing to commit, working tree clean

Izkata · on Nov 23, 2023

Checkout is also used for reverting changes to unstaged files.

alex_smart · on Nov 24, 2023

git checkout is overloaded and has two use cases. git checkout $branch is safe, git checkout $file_or_folder_name will help you shoot yourself in the foot.

LtWorf · on Nov 23, 2023

That's not what checkout does.

chlorion · on Nov 23, 2023

In a repo with unstaged changes, running "git checkout ." will checkout files from a specific commit into the worktree and clobber your unstaged changes. By default it probably uses HEAD as the commit to checkout from.

You can checkout files and directories from completely unrelated commits and even unrelated repos!

I have used this to checkout specific useful files from other projects. It's a little nicer than just copying them in, because you can keep the repo tracking branch updated and keep checking out updates, and you can easily compare your file to theirs, and see what changes they have made to the file.

astrobe_ · on Nov 23, 2023

There must be something you do terribly wrong if you believe it happened to you in normal use.

mb7733 · on Nov 23, 2023

The poster is talking about when you do something like this: `git checkout .` That wipes out all unstaged changes on tracked files in the current directory

astrobe_ · on Nov 24, 2023

That's as unfortunate as 'rm *' (and good to be aware of BTW), but doing this deliberately doesn't qualify as "normal use" for me.

alex_smart · on Nov 24, 2023

In my career, it has already happened more than once. And definitely not deliberately.

There are many ways this can happen accidentally. You could be trying to checkout a branch and tab complete your way into a folder name instead. You could be typing alt/esc + . with the intent of getting last argument of previous bash command, keyboard glitches and you end up with . instead.

Just because it hasn’t happened to you yet, you shouldn’t discount the experience of others. That’s like saying you have never messed up dd yet, so there are no problems with dd’s design.

astrobe_ · on Nov 24, 2023

No question this design is questionable and I'm glad I learned about it before falling victim of it. I read that they introduced git-switch and git-restore as a way to avoid such pitfalls.

But. I once messed up badly with something else once, makefiles. I was kind of overconfident and end up screwing myself by telling something to write its output to my source files by mixing up $< and $@ somehow. Another time I accidentally overwrote a file with a simple output redirection. These two days I learned to be cautious with those commands. Same way that I always stop 2 seconds to reread my command line when I use rm.

Fool me once, shame on the tool; fool me twice, shame on me.

Well, that's what OP was referring to, but in such a "click-bait-y" way.

mb7733 · on Nov 25, 2023

While I agree changing branches is the most common use-case for git checkout, it's far from the only "normal" one. The form that updates files to match revisions is actually mentioned first in the man page:

  DESCRIPTION
    Updates files in the working tree to match the version in the index or the specified tree. If no pathspec
    was given, git checkout will also update HEAD to set the specified branch as the current branch.

Now that `git switch` exists, one could argue that this is the main use case for `checkout` and the other one could be deprecated.

afiori · on Nov 23, 2023

I agree that git is almost asking you to juggle commits.

My preference is to use temporary branches and cherry-picking instead of stashing; I mostly use a gui* to work with git so it is easy to select the two or three commits to cherry-picking or see visually if an interactive rebase would work.

* https://gitextensions.github.io/

specialist · on Nov 23, 2023

> I'm working much more directly with the underlying abstraction

Your strategy of seeing things as they are is a useful general purpose life skill.

hotnfresh · on Nov 23, 2023

You can check out multiple branches in different directories from a single git repo. This saves me a lot of what used to be stashing.

bshacklett · on Nov 25, 2023

I’d be interested to hear more about your workflow I’ve thought about doing similar things, but it just seems like too much overhead.

hotnfresh · on Nov 25, 2023

Great to have two “work trees” when you’re often called in to troubleshoot or help with other branches. Sometimes handy during merges, too.

codesnik · on Nov 23, 2023

why crucified? you're doing exactly what I do. All the people who have any trouble with git whatsoever try to use it as a black box for some high-level whatever ideas of what is their workflow should be. And git is not that, git is a thin wrapper around simple and elegant data structure. If you understand it, then everything clicks and git doesn't EVER gives any trouble.

Your friends are unreasonable, unless you collaborate with them on the same branches and rewrite them after you shared them.

thiht · on Nov 24, 2023

I do that too (except I use a soft checkout instead of —hard, I prefer to review and delete the reset changes myself).

I tried using git worktree for a while when working on multiple branches, but it’s a pain to use… Stashing is easier.

pitaj · on Nov 23, 2023

> How do you undo a merge that you didn't mean to do/did wrongly?

I usually used `switch` for this:

    # Check out the previous commit
    git switch -d HEAD~
    # Overwrite the branch
    git switch -C <current branch>

Kuraj · on Nov 24, 2023

Nah I do that too. I also amend and push --force to my private branch a lot to make the git history easier to follow for whoever is going to do a code review.

erik_seaberg · on Nov 23, 2023

Not a fan of the staging area, because it won't be tested. I would rather stash some changes to postpone them, then test and commit the workspace.

thiht · on Nov 24, 2023

What does the staging area has to do with tests?

erik_seaberg · on Nov 24, 2023

When the workspace and the staging area aren’t in sync, the differences cannot be tested because the compiler doesn’t know how to deserialize the .git/index blob. The workspace is stored as normal source files, and committing those would be better.

bshacklett · on Nov 25, 2023

That would remove a huge amount of flexibility from the commit process, though. git add -p (and its siblings) are a huge part of my workflow. I often work on multiple changes at once, but break them out into their own atomic commits with the ‘-p’ flag.

What would be great is a simple way to have build/lint/etc. tools look at the staging area instead of the workspace.

erik_seaberg · on Nov 25, 2023

For what it’s worth, I didn’t have much trouble switching from “git add -p” for changes I want now to “git stash -p” for changes I want later.

bshacklett · on Nov 26, 2023

I’ll definitely give that a try

zaptheimpaler · on Nov 23, 2023

Both of those sound totally reasonable to me! I don't know of any better ways to do that stuff and there's nothing risky about it.

int0x80 · on Nov 23, 2023

One thing that is risky about git reset --hard is that any non-committed changes are lost. That has bitten me a few times.

afiori · on Nov 23, 2023

My controversial opinion is that git needs some kind of gui that help you keep track of the state of the repo

chronial · on Nov 24, 2023

A very effective solution for that is a well-configured shell. IF you summarize the state of the repo in the prompt, it is always visible while typing a command.

LtWorf · on Nov 23, 2023

Completely reasonable if you do on your local branch, or if you have a convention that remote branches starting with your name or something are yours only.

If you rewrite history on master… well… completely unreasonable.

codesnik · on Nov 23, 2023

also, using stash is only a first level. git cherry-pick, git rebase --interactive, git reset --hard HEAD^ and friends allows do such moves and cosmetic extractions after the commit itself. I also prefer to split cosmetic changes and feature changes, so I extract cosmetic stuff to the main all the time.

rkangel · on Nov 23, 2023

Git doesn't have the concept of "main is special", but at least tools like Gitlab have protected branches to stop you screwing up too much.

Some concept of "parent" and "child" branches would actually be pretty interesting. You do have to support multiple "parent" branches though for long term support branches.

samus · on Nov 23, 2023

Protecting branches is indeed very important. I make errors all the time when screwing around. It helps enormously being restricted to just messing up one's feature branches. Many other changes can be done via the GUI with PRs and the various kind of controlled merge and rebase strategies they support, like Merge, Rebase + Merge, FF-only Merge, Squash merge, etc.

cedws · on Nov 23, 2023

It's also a security feature. If you have a repo with a lot of developers working on it, you need to be sure they absolutely cannot slip in code with nobody noticing, or trigger CI/CD and compromise build secrets or even production.

andrybak · on Nov 23, 2023

> Git doesn't have the concept of "main is special"

Technically, there is special handling for both "master" and "main" in Git in fairly obvious, but I'd argue in a not very important way. When you merge two regular branches, the commit message is `Merge branch 'source' into destination`. But not if destination is `master` or `main` – the `into ...` part is omitted for those merge commits.

But this is just for backward compatibility. Git is very conservative in changing such user facing behavior as generated merge commit messages. To get Git to treat `master` and `main` truly without special handling, set empty value to config option `merge.suppressDest` [1]:

    $ git config merge.suppressDest ""

`master` is also used as the default name for the default branch in newly created repositories. See option `--initial-branch` of `git init` and config variable `init.defaultBranch` [2] to override. Git for Windows, for example, allows setting the config option in its installer.

Source code:

For merge commit formatting: https://github.com/git/git/blob/2108fe4a1976f95821e13503fd33...

For default branch naming: https://github.com/git/git/blob/91e2ab1587d8ee18e3d2978f2b7b...

Git for Windows installer suggesting setting `init.defaultBranch`:

- https://github.com/git-for-windows/build-extra/blob/586c46ec...

Footnotes:

[1] https://git-scm.com/docs/git-merge#Documentation/git-merge.t...

[2] https://git-scm.com/docs/git-init#Documentation/git-init.txt...

chrnola · on Nov 23, 2023

There’s some special handling for FETCH_HEAD too (i.e. which branch on a remote is considered the default).

andrybak · on Nov 24, 2023

> which branch on a remote is considered the default

You probably mean this place in code: [1]. It uses function git_default_branch_name from refs.c [2], which uses config variable `init.defaultBranch` I've mentioned above. But if it and other look-ups fail, it does fall back to a hard-coded "refs/heads/master".

[1] https://github.com/git/git/blob/v2.43.0/remote.c#L2380-L2394

[2] https://github.com/git/git/blob/v2.43.0/refs.c#L671-L705

Edit: removed mention of a deprecated Git feature to avoid confusion.

jacoblambda · on Nov 23, 2023

It actually does but it's very much in alpha/active development (under the umbrella of OpenSSF with the intent of being integrated into mainline git eventually).

https://github.com/gittuf/gittuf

ndriscoll · on Nov 23, 2023

Git itself doesn't run a persistent process and I don't see how it'd make sense to prevent a user from making arbitrary changes to their local repo, so this sounds like just another server like GitHub, Gerrit, Gitlab, etc. that already have those features.

jacoblambda · on Nov 24, 2023

This provides a mechanism for any client to identify and reject changes from any given remote that are non-compliant with the policy committed to the repo.

So this is something that servers would certainly be able to use but it is also something that operates at the client level. And you could use it in environments where nobody uses the big hosted git platforms (github, gitlab, gitea, etc). So you could still use this in environments where you fetch changes to a project over ssh from a friend or cocontributor's dev machine. Or via basic, barebones read only https or ssh hosting.

i.e. this is access control that works independent of centralized servers.

xanderlewis · on Nov 23, 2023

> in general, even if people’s intuition about a topic is technically incorrect in some ways, people usually have the intuition they do for very legitimate reasons!

This is worth an essay of its own.

mwexler · on Nov 23, 2023

I guess.

To me, the opposite is a more worthy essay: why, with all the power to customize our tech, do we create things that consistently work differently than people's intuition?

The fact that it "mostly jibes" feels like a footgun, not a feature.

I get that for some, "git just works! It made sense from day one" but in my limited experience, 0% of people I've worked with have said that.

Sure, we can all learn the tech. And expert techniques in any field often don't jibe with naive expectations. But for me and the folks I work with, the tech industry feels like it's gliding more towards inscrutible tools vs ease of use.

We've hit a stage where many rely on code completion bots and answer-supplying bots instead of being able to directly embrace our tech. I wish the tech was more approachable on its own, but perhaps this is the natural evolution of things.

ndriscoll · on Nov 23, 2023

As one of those people that thinks it's extremely intuitive, I have to wonder where the confused people are learning about git. The documentation on the site[0] is quite clear:

> A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master. As you start making commits, you’re given a master branch that points to the last commit you made. Every time you commit, the master branch pointer moves forward automatically.

It has multiple diagrams explaining how commits point to their content and their parents, and branches point to commits. The Pro Git content has been there for at least 10 years (it's what I learned from 10 years ago).

Maybe the problem is just that the Internet is full of blogs that have incorrect diagrams (like those in the OP) and bad explanations, despite the main website having great documentation!

[0] https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-N...

chihuahua · on Nov 23, 2023

If Git was "extremely intuitive", and the documentation was "great", why would so many otherwise smart people keep writing blogs about it with incorrect diagrams?

What is your theory about why so many people are having difficulty creating a correct mental model about Git, and why so many people are writing incorrect blogs about it?

ndriscoll · on Nov 23, 2023

Like I sort of implied, my theory is people haven't read the docs on the official site (or the book that's on the site), and keep regurgitating bad information that they read on some blog or howto site. I don't know why they do this. I don't make these sites, so I don't know what motivates people who do, especially people who don't understand what they're writing about.

If you understand the basic design premise (commits are content-addressed immutable snapshots), the pointer stuff is kind of obvious. It has to work something like that for it to be able to be immutable if you want to be able to make branches/tags after the commit is created.

afiori · on Nov 23, 2023

In part it is because git is hard to use, in part it is because mostly people learn git by oral tradition and often treat it like sorcery.

chthonicdaemon · on Nov 23, 2023

People writing software fall into several categories based on the problem they're solving, the reason they're solving it and the audience of the solution.

I solve my own problems for my own reasons all the time and therefore other people's intuitions are immaterial in the process. It would just slow me down to think "how would other people use this" when I'm focused on some technical personal problem.

Commercial software developers solve problems with the clear purpose of selling the solution to others and where they know ahead of time roughly what their audience's intuitions are. This is why intuitive GUI applications exist - there are whole industries devoted to finding out what people expect, what lowers cognitive load etc. iOS and Android apps give you a good idea of what is possible with modern tech when the purposes are properly aligned.

The problem here is that git was expressly developed by Linus to solve his own problem in a way that made sense to him with no thought as to how other people would use it. There were no focus groups, early betas, feedback from users and so on. At best there has been slow fixes to the porcelain to fix the stuff that bothers the people who could make a PR to git. On the other hand there are also many front-end projects that attempt to align some other person's idea of how version control is supposed to work with the Git model.

Anyway - I am in the camp where I very seldom get confused about a git thing because the actual expressed model is really simple (in the way that x86 assembly is a "simpler" language than Java). I find most front-ends much more confusing because they don't seem to work the way I expect. But I am never surprised when someone's pet project is understandable only by themselves. Or indeed when a consumer product is consumer-friendly. The real surprise is when a lone programmer makes something for themselves that then goes on to have wide appeal.

codesnik · on Nov 23, 2023

because a) everyone's intuition is different, b) sometimes uneducated intuition is just wrong. On a surface level things looks good, but in some specific situation intuitive ways of doing things could be not consistent or don't have any solution at all. In this cases you just stuck with magic box of software which did something and you have no idea what and reach for backup.

Git is not like that. It is very-very simple. If you learn basics of it, your intuition will align with git's "intuition" too, and you can do crazy things with total peace of mind, without googling or looking into source code of git to see how they had to make something "intuitive" in some definition of the word.

marcosdumay · on Nov 23, 2023

Often, people's intuition is wrong on very important ways, and something that works like they expect is sure to create footguns or just blow up by itself.

But I'm not sure git is a case of this. The DVCS that were created following people's intuitions were known to be slow and internally complex, but I have never heard about them failing. (And the slowness is obviously of a kind that can be optimized away.)

We just stuck with the worst UI ever devised in public for a VCS because of network effects.

chihuahua · on Nov 23, 2023

I totally agree with it being "the worst UI ever devised". It's fine to use commits with parent pointers and branches as pointers to commits and all the other stuff internally. But there should be a UI wrapped around that that maps to operations that make sense for the purpose of working on a software project.

Not this:

git merge [-n] [--stat] [--no-commit] [--squash] [--[no-]edit] [--no-verify] [-s <strategy>] [-X <strategy-option>] [-S[<keyid>]] [--[no-]allow-unrelated-histories] [--[no-]rerere-autoupdate] [-m <msg>] [-F <file>] [--into-name <branch>] [<commit>… ]

hinkley · on Nov 24, 2023

> do we create things that consistently work differently tquadraticallyhan people's intuition?

I don't know if this is true of git, but in general we have a disease called, "solving problems we don't have". I don't mean the YAGNI issue, or people trying to talk others out of doing engineering because the current masochistic process is good enough.

I mean the impedance mismatched caused by taking a concrete, physical problem, translating it into a domain that looks nothing like the problem you're meant to solve, and then tweaking how the fiction works so that it fulfills the requirements.

A few wise people have warned me off of this strategy over the years, but there's not nearly enough ink spent on this topic. This impedance mismatch gets worse over time. They asked for X, and you gave them Y, so when they ask to refine X into X + 1, it gets a little harder each time. The complexity goes up and you find confused product owners asking for what should be a simple change and getting tremendous pushback that smells a lot of deflection.

Meanwhile because users think the system works one way, they expect it to behave in certain ways and constantly get rudely surprised by how it actually works.

Again and again in my career, I've found that a lot of performance problems are obscured by accidental complexity and/or X-Y problems, and once you know the task you actually need to perform, and write the code to solve the actual problem instead of an analog, everything becomes clearer. A mix of determinism and pre-determinism (static calculation) makes for a much faster system without sacrificing legibility in the process.

The issue, I think, is that people find the problems they are given to be boring, and so they try to solve an analogous problem they find more entertaining - at first. There are other ways to find joy and fulfillment in simple tasks besides making them into stunts.

atq2119 · on Nov 23, 2023

You do have a point, but it's not a slam dunk. Intuition isn't some fixed thing but arises from personal experience. A lot of that is common to a culture, but there are different cultures and in any case, some truly personal aspects remain.

There needs to be a balance between creating new, more powerful intuitions, and meeting people at the intuitions they already have.

Case in point, Git's branching model is pretty intuitive when you understand how Linux kernel development works. Perhaps 0% of the people you've worked with have looked into that. That's fine. Different cultures...

Another example that may be worth studying is mathematics and the hard sciences. Learning those is a lot about learning powerful intuitions.

BlueTemplar · on Nov 23, 2023

Yeah, few things are actually "intuitive". "Shared familiarity" is probably a better term.

timacles · on Nov 23, 2023

This is the same reasoning that SQL gets criticized with. But the answer is simple.

Git (and sql) range from simple task to very complicated. Everyone likes to fantasize about making it easier but they’re only thinking about the fraction of functionality they use, rather than everything it currently does.

If someone could come up with a simpler solution they would, but they can’t because git can do extremely complicated things and is internally consistent. Most people underestimate that part

xanderlewis · on Nov 23, 2023

That doesn’t seem like the opposite to me. It seems like the same thing. Rather than rejecting people’s intuition as ‘understandable but wrong’ why don’t we use it as the basis for a better solution?

eviks · on Nov 23, 2023

Partially because that's a much harder design challenge, especially for people with an unrelated skill set

webstrand · on Nov 23, 2023

I'm still missing what part of the intuition is incorrect? It seems like the only "incorrectness" is that there's no explicit hierarchy of branches. Except that's wrong the HEAD ref points to the default branch. Any other branches are of equal significance, though.

Sharlin · on Nov 23, 2023

No, the HEAD ref points to whatever branch is "active", that's how the active branch is defined. Indeed `git checkout branchname` does nothing except make HEAD point to the commit that `refs/heads/branchname` points to.

The intuition jvns meant is the idea that a branch only constitutes the commits since the point of divergence, but every branch actually contains the full history up to the root of its tree, and `git log` of course shows that. (If you want to only show the commits specific to a branch, you can do `git log parent..branch`. Note also that two branches need not have any common history, it's perfectly possible for a git graph to be disconnected.)

xorcist · on Nov 23, 2023

> `git checkout branchname` does nothing except make HEAD point to the commit

You probably know this, but since we are being pedantic we might as well get it right: That describes "git reset". "git checkout" does that and record that we are tracking branchname. So any commits will move both HEAD and the branchname reference.

Izkata · on Nov 24, 2023

Well along those same lines, making a commit doesn't move HEAD, it's still pointing at the same branchname as before.

  cat .git/HEAD

mgerdts · on Nov 23, 2023

Intuition would be that the branch starts at the point that it diverges from main, labeled “base” in the first diagram. In reality, the first commit in “main” and “branch” are the same commit.

Intuition likely comes from how a tree (fir or oak, not binary) is structured. Generally a branch starts at the trunk or some other branch, not at the ground where the trunk gives way to roots.

jancsika · on Nov 23, 2023

I don't agree with the author here.

Intuition is travelling down main path that has branches which diverge and re-merge into the main path.

That's why people seem to intuitively get "merging" back into main, whereas that doesn't generally make sense for physical trees.

zestyping · on Nov 24, 2023

The intuition is that a branch consists of a line of commits off of a trunk. Like, you know, what the word "branch" means in real life.

But in git, a branch doesn't "contain" anything. It is just a pointer to a single commit. This is completely counterintuitive and makes no sense to most people at first, for a very good reason: the analogy is misleading.

These things shouldn't be called branches. They are pointers: they can be repointed anywhere, dereferenced, and copied by reference.

HEAD is also a stupid term, because it doesn't refer to the head of anything. HEAD is the current pointer, and it can point to any commit, not just the head of a branch. It should be called something like "here". The "detached HEAD" warning is scary and useless.

Just look at these two commands, which do very similar things but look totally different:

    git checkout -b staging
    git branch -f staging 3d1b582

They are just pointer assignments. Wouldn't it make a lot more sense if they were written like this?

    git point staging here
    git repoint staging 3d1b582

informalo · on Nov 23, 2023

Yup. If it works, it ain't stupid.

informalo · on Nov 23, 2023

> You do need to explicitly specify the other branch when merging or rebasing or making a pull request (like git rebase main), because git doesn’t know what branch you think your offshoot is based on.

I think a big issue with the presented intuition is that it's limited to wanting to merge the base/trunk/main branch into your feature branch. However, sometimes you want to merge a feature branch into another feature branch. With this in mind, you can form a better intuition, imo, where it's absolutely clear that you have to specify what branch you want to merge into another one.

keybored · on Nov 23, 2023

Lately I’ve wanted branches (heads) to have a corresponding tail which points to the base commit that the branch sits on top of (like the commit on `main` when you created the branch).[1] Because branches get rebased all the time and eventually you have six commits out in the Æther somewhere and you have to think twice about where it even starts. And yeah you can probably think for a few seconds and recall that you have worked with John and not Jimmy on this branch so the seventh commit backwards that belongs to Jimmy must be the commit base. Or Git can tell you that the seventh commit belongs to `main` already. But why should you have to expend any effort?

You can optionally include the base commit when you send out “patches” to a mailing list.[2] Because it might not have been obvious that you based your changes on:

- The latest release

- The main development branch

- Some integration branch (probably an error)

You also need to keep the “base” in mind when you use `git range-diff` because that tool takes two ranges lik `main..previous` and `main..current`. And sometimes you can rely on just using `main..` and letting Git figure it out but in my experience passing an explicit value sometimes works better.

`git range-diff` is a super-cool but perhaps niche tool. But you basically have to use it on review round number 2 and higher when you are sending changes to the Git project.

[1] This has been discussed before and there was a patch series that implemented it. But that was basically a POC and done in the spirit of “this is useless IMO but here’s how you could do it”... and the implementation didn’t factor in all the shenanigans that you can do with `reset` and `rebase` so it couldn’t have been merged as-is. (Although to be fair: the bar was not set to work perfectly with any kind of branch reset etc., which I suspect is impossible in any case.)

[2] Patches after all are just commit messages plus the patches themselves and don’t tell you what they are based on.

Snarwin · on Nov 23, 2023

It looks like this is what git merge-base --fork-point is supposed to do, although according to the docs it is not 100% reliable.

keybored · on Nov 24, 2023

Based on all the discussions I’ve seen I think it’s impossible to programmatically find the “base” in general. Maybe it’s possible for most cases though.

gitanovic · on Nov 23, 2023

I think that one way to "easily" understand the syntax of git is to remember that when you perform a command you "always" modify the current branch

for example: git merge my-branch will merge my-branch into the current one

while git rebase my-branch will rebase current one on top of my-branch

PeterWhittaker · on Nov 23, 2023

I just reread my take on branches and relearned some stuff I’d forgotten: https://peter-whittaker.com/obligatory-grokking-git-post

Warning, all text, no diagrams....

Vinnl · on Nov 23, 2023

Years ago I wrote this dynamic tutorial that visualises branches as you read: https://agripongit.vincenttunru.com

It's aimed at folks who know how to use `git add` and `git commit`, and would like to spend 15 minutes to form a mental model to help them understand what's going on.

In case it's useful to someone.

why-el · on Nov 23, 2023

I've learned only one constant with git in my years as a programmer: master your own employer's git use cases, and pray to god for three things:

1. you don't change places often and thus git patterns.

2. you don't accidentally ship and commit a multi-GB file to your remote.

3. you don't change the git process on yourself and your colleagues without an extremely solid reason.

Document your chosen git patterns, even in 2023.

spenrose · on Nov 23, 2023

If you learned from this (excellent) piece, I recommend that you buy and work through https://leanpub.com/learngitthehardway . It will take less than a day, and you'll have a much stronger foundation for a core tool.

Aeolun · on Nov 24, 2023

Reading this it seems to me that it should be incredibly easy to create an alternative version of the git client that stores the lineage per branch and can inform people if they’re doing things they probably shouldn’t.

ChrisMarshallNY · on Nov 23, 2023

That's an excellent explanation.

> “Wrong” models can be super useful.

This is used in usability and UX design a lot. Affording mental models that don't reflect the actual code, happens all the time.

samus · on Nov 23, 2023

This is perfectly fine and the added value of a great application if it can hide the underlying reality completely. With Git, the abstractions are paper-thin at best though. Good UIs can indeed cover up many aspects, but they only work as long as there are no merge or rebase conflicts. To correctly resolve these, the user has to have a precise picture of what is actually going on.

pvg · on Nov 23, 2023

This is used in usability and UX design a lot.

It's the fundamental thing that makes UI work. I've always liked the title of Brenda Laurel's book - Computers as Theatre

k__ · on Nov 23, 2023

That article would have been a lot better if it showed illustrations for the "right" mental model too.

taberiand · on Nov 23, 2023

The right mental model is to realise the 'main' branch is only special by convention - git doesn't actually treat it differently from any other branch.

All of the confusion expressed in the article stems from a misunderstanding that main should work in some special way.

Of course every branch's history goes all the way back to root and not to some arbitrary common commit of another branch like 'main'. Of course rebase and merge can work "backwards" from main onto some branch (because it's not "backwards" because main is not special - it just isn't done much in practice because keeping main straight helps with collaboration)

Furthermore, by realising that main isn't inherently special, it becomes obvious that the actions can be done between any two branches as needed.

The right mental model is - it's just commits, all the way down.

k__ · on Nov 23, 2023

"All of the confusion expressed in the article stems from a misunderstanding that main should work in some special way."

I didn't have that impression when reading that article.

To me it seems that the confusion comes from thinking in actual branches, and not from thinking anything special about main.

taberiand · on Nov 24, 2023

They're only thought of as actual branches (off of main) if you're also thinking of main as a trunk that is in some way special or different.

Sharlin · on Nov 23, 2023

But... If you have a rebase workflow, then `git checkout trunk; git rebase branch` is exactly how you "merge" an offshoot branch into a trunk branch! That's what Github does when you rebase-merge a PR, for example.

karatinversion · on Nov 23, 2023

No, that’s not right. If you did that, you would need to force push to get the result pushed to the remote.

Sharlin · on Nov 23, 2023

Oh, right. So what actually happens is that the offshoot must first be rebased on top of the trunk, and then trunk can be fast-forward merged/rebased (same thing, really) to the offshoot's head.

BlueTemplar · on Nov 23, 2023

Another way to think about merging and patches : https://jneem.github.io/merging/

gtirloni · on Nov 23, 2023

Some of Julia's tweets started to get suffixed with "I don't want advice about this". It must have reached unacceptable levels.

cube2222 · on Nov 23, 2023

There is a very good article by GitHub: https://github.blog/2020-12-17-commits-are-snapshots-not-dif...

TLDR: Think of commits as snapshots, not diffs, and you'll be fine.

bloopernova · on Nov 23, 2023

tl;dr Please ignore, just me working through a Python+pygit2 problem. I solved it in a grandchild comment.

I had so much trouble trying to map my intuited/mental model of git onto pygit2 that I gave up and just used the git module.

I wanted to automate a fairly simple thing in Python as opposed to bash+commands. My reasoning being that I wanted to do it "right" and be a Big Programmer Real Boy(tm). I just wanted to create a branch remotely in Github, pull the repo, and checkout the new branch. I got stuck going in circles trying to figure out why I was always left in detached HEAD state because I didn't understand exactly what git was doing during a checkout.

    # repo has already been pulled
    if os.path.exists(repo_path):
        local_repo = git.Repo(path=repo_path)
        self.log.debug(f"current branch: {local_repo.active_branch.name}")
        local_repo.git.checkout(branch_name)

That's super easy and is much the same as running the commands in the shell or in a bash script.

Of course, I've lost my poor implementation using pygit2, so I'll add that later if I find it. Thankfully there's a good discussion surrounding the issue I encountered in this excellent "roll your own git in Python", which doesn't use pygit2, but the concepts are the same: https://www.leshenko.net/p/ugit/#checkout-switch-branches

bloopernova · on Nov 23, 2023

This isn't asking someone else to make this work, it's more of a caution to convince folks like me to just use "import git" rather than pygit2:

So something like this was what I expected to work, but leaves the repo in detached head state:

    import pygit2
    def checkout_branch(path, branch_name):
        repo = pygit2.Repository(path)

        branch_ref = repo.lookup_reference(f"refs/remotes/origin/{branch_name}")
        print(f"{branch_ref.name}")

        repo.checkout(branch_ref)

The branch_ref.name prints "refs/remotes/origin/test" but git status says "HEAD detached at origin/test"

So I'm probably feeding the wrong thing into repo.checkout, but I'm honestly not sure what else it should be.

Funnily enough, git itself tries to do the right thing if pulled in a detached head state:

    From https://github.com/testorg/example
    * [new branch]          test       -> origin/test
    You are not currently on a branch.
    Please specify which branch you want to merge with.
    See git-pull(1) for details.

        git pull <remote> <branch>

bloopernova · on Nov 23, 2023

Ha, and of course just messing around gets me something that actually works.

There always seems to be just one more stackoverflow thread to read that has the real answer: https://stackoverflow.com/questions/68435607/how-to-clone-ma... (found via Kagi which I wasn't using before, and the search "pygit2 detached head")

    def checkout_branch(path, branch_name):
        repo = pygit2.Repository(path)

        main_branch = repo.lookup_branch("main")
        print(f"Main branch upstream: {main_branch.upstream_name}")

        if branch_name not in repo.branches.local:
            print(f"Branch {branch_name} not found in local branches")
            remote_branch = "origin/" + branch_name
            if remote_branch not in repo.branches.remote:
                raise SystemExit(f"Branch {remote_branch} not found in remote branches")
            (commit, remote_ref) = repo.resolve_refish(remote_branch)
            repo.create_reference("refs/heads/" + branch_name, commit.hex)

        branch = repo.lookup_branch(branch_name)
        print(f"Branch name: {branch.name}")

        repo.checkout(branch)
        print(f"Is branch head? {branch.is_head()}")

        (commit, branch_remote) = repo.resolve_refish("origin/" + branch_name)
        print(f"Remote branch: {branch_remote.name}")
        branch.upstream = branch_remote

With git reflog telling me the right thing:

    d44aedc (HEAD -> test, origin/test) HEAD@{0}: checkout: moving from main to test

And git push has the remote branch already set.

I wish there was a pair programmer AI that you had to explain stuff to. That would enable the "by explaining it, I solved it" phenomenon.

Izkata · on Nov 24, 2023

> I wish there was a pair programmer AI that you had to explain stuff to. That would enable the "by explaining it, I solved it" phenomenon.

It's called rubber duck debugging, named for having an actual rubber duck at your desk you'd talk to.

nunez · on Nov 23, 2023

Great explanation. Thanks, Julia!

zdw · on Nov 23, 2023

A lot of things in git are just pointers to commits, and then the git implementation handles them under the covers in some way that usually makes sense but not always.

One example that also bites people: moving files isn't stored in git - if you move files (even with `git mv`) and create a new commit, the moves aren't stored, but this is reconstructed later by the client based on similarity, which comes from the diff algorithm.

And git has multiple diff algorithms to pick from: https://git-scm.com/docs/git-config#Documentation/git-config...

And optionally to not detect renames in diff output with `diff.renames`: https://git-scm.com/docs/git-config#Documentation/git-config...

keybored · on Nov 23, 2023

Yup. “Storing moves” is the kind of thing that might sound intuitively obvious but then gets gnarly and non-obvious when you think about it for five minutes. And so something that might be “obvious” to do then turns out to be so non-obvious—how to catch all file moves (intent) outside of simple identitical content cases, and how do you represent them internally?—that you realize that just using snapshots is really the best thing to do.

chrismorgan · on Nov 23, 2023

It’s completely trivial. The obvious and correct place is in the commit object just like author and date and such, since renaming is semantically part of the commit, not the tree:

  commit 0123456789abcdef0123456789abcdef01234567
  parent fedcba9876543210fedcba9876543210fedcba98
  author Nemo <nemo@example.invalid> 1234567890 +0000
  committer Nemo <nemo@example.invalid> 1234567890 +0000
  rename-from path1.old
  rename-to path1.new
  rename-from path2.old
  rename-to path2.new

  Commit message

And you don’t detect moves (because that’s madness), but require that people record them deliberately, just like every other VCS has done. There’s even git-mv already, it just skips a step that every other VCS’s equivalent command would do. (And technically this all works out because the index is a commit, so you can record the rename normally.)

Of course, all of this assumes that moving a file is a meaningful operation. Perhaps ideally (for most languages and systems) you’d track this in far smaller chunks, so that you can track changes to a function even when it alone was moved to a different file. But things like Git aren’t interested in those kinds of semantics, and work technically at the file level, more or less, so I think it should track renames because in practice straightforward renames are super common, but often also involve other changes that thwart rename detection. Years ago Linus explained why he didn’t like storing moves (someone else has linked it), but I’m largely not sold with his reasoning—the theory of the perfect has hindered the useful, and file renames are commonly meaningful in ways more than he said.

keybored · on Nov 24, 2023

> It’s completely trivial.

Like I implicitly said: how to do it beyond the “simple identical content cases”?

But if the solution is for the user to explicitly order renames (i.e., this renamed Java class is a file move) then the solution is indeed simple.

I see the point that Linus was making that you may want to be able to see “function moves” and so on. But in practice I am very often interested in file moves since you can inspect the file history easily in Git—except when you hit some wall because someone renamed the file. Then you need to re-run the command with `--follow`. Contrast all of that with a function move... I almost never can summon the will to fish out the incantation (like a regex or a robust line range) which will give me the history of a function across intra- or inter-file moves and so on.

silon42 · on Nov 24, 2023

The problem with that scenario is that usually it doesn't support a real-world-scenario where you do a rename in the tool (like some IDE) and it doesn't do the corresponding git operation.

(yes, some IDE might have git integration, but personally I don't like my IDE messing with git, except read-only (annotate, diff))

chrismorgan · on Nov 24, 2023

That’s… nothing special. If you don’t have Git integration in your IDE, you already have to do something like `git mv` or a `git add` and `git rm`. Nothing has changed in this new hypothetical world.

hyperthesis · on Nov 23, 2023

BitKeeper already did it.

lubutu · on Nov 23, 2023

I think this is the one thing I feel BitKeeper does better than Git. Git can get confused about where a file came from, for moves but especially for copies, and so the version history ends, even if you ask it to try and follow along. BitKeeper, on the other hand, keeps the moves and copies as part of the history, so you can always trace it through to the origin of the file, no matter how circuitous.

account42 · on Nov 23, 2023

git log has --follow but unfortunately it only works when spefying a single file and not e.g. a whole directory.

2-718-281-828 · on Nov 23, 2023

> moving files isn't stored in git

is there an intuitive and enlightening explanation as to why it is this way?

keybored · on Nov 23, 2023

Git stores snapshots and that’s it. The whole tree, not per-file.

As to why Linus doesn’t like storing file moves: https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.721...

ziyao_w · on Nov 23, 2023

It's kind of funny to see Linus browbeaten other people into submission regardless of him being right or not, while claiming "I am always right".

A few counter points:

- `hg` has `cp`, and I believe both Meta and Google's internal systems have that; - git has `mv`, which was added later, but it is really janky and git would forget files are moved which I think it is because git doesn't try to track that, likely because of the philosophy here; - as for storing file moves - nobody said you *have* to use this information, but you can certainly use this information to help with things.

The whole thread is an interesting read though and I will try going through it someday - maybe doing that would change my mind.

xorcist · on Nov 23, 2023

I'd be happy to argue why Linus is wrong here. Many things would be much easier if git recorded some more metadata in every commit: file moves, and branch moves, to start with.

Having some sort of notion of "parent branch" would be very useful for a number of common operations, and a "renamed file" without having to rely on client dependent heuristics too. Empty files trip people up all the time so a "create file" would fit in perfectly.

These concepts would also be a good basis for more user friendly clients. Other version control systems do this the surprise factor should be low.

erik_seaberg · on Nov 23, 2023

People would get lazy and rename a file without telling Subversion they had done it, so it would write a “old file deleted, new file created from nothing” revision. Most of the merge conflict resolution machinery just couldn’t run without the missing guidance. Git infers someone probably renamed a file you edited or vice versa, which seems risky but works better in practice.

bqmjjx0kac · on Nov 23, 2023

Man, he communicates like a dick all the time I guess.

keybored · on Nov 24, 2023

He does argue in a borderline hysterical way on many occasions.

layer8 · on Nov 23, 2023

For the historical rationale see here: https://gist.github.com/borekb/3a548596ffd27ad6d948854751756...

In short, Linus stance is that file renaming doesn’t matter, only the contents of files matter, and the moving of contents between files. Moved/renamed files then fall out as a special case of moving content.

Personally, I think this is a case of the better being the enemy of the good, and his “clearly superior algorithm” doesn’t work as well as claimed in practice. Or maybe tooling merely still isn’t up to snuff after 18 years.

seba_dos1 · on Nov 23, 2023

I don't think it's about having a stance, it's about git's architecture. From the commit graph point of view, there's no such things as moving anything at all, neither files nor content. Commits represent a whole new state of the repository, not a diff from the previous state. The only way a commit is linked to the previous state is via parent pointer, it can otherwise be completely unrelated (and you can simply change the parent pointer without changing anything else in the commit). Any diffs are calculated at runtime. The issue with renames is just a consequence of assuming such data model - you could try to plaster it over with some metadata, but ultimately you would still be fighting against the model rather than working with it.

Many people develop a bad mental model with commits as diffs, because that's what the UI makes them think commits are. It can work for a while, but inevitably leads to confusion later on.

layer8 · on Nov 23, 2023

As you say, commits link to their parent(s), and those links effectively represent the edges of the commit graph. It makes perfectly sense to record moves on those edges. That’s how other VCSs do it. There is no conflict with the commit model.

Viewing the commit graph in terms of nodes (commits) or edges (diffs) is equivalent, these are dual views you can easily convert between. The internal representation is independent from that. Some VCSs use a mix of diffs and full revisions internally. Even Git uses delta compression when packing objects.

seba_dos1 · on Nov 23, 2023

What I meant is that git doesn't have any structure to represent an edge other than a simple pointer. Conceptually it wouldn't be a big change to add some, but the consequence of that is that everything in git revolves around nodes rather than edges, and whenever the concept of an edge is needed (such as in "cherry-pick") it's being calculated on fly.

layer8 · on Nov 23, 2023

I don’t see where this would be causing any issues. There is a canonical place where to put edge metadata, namely in the child commit. And whenever you’re interested in move information, you have to process the respective child commit anyway.

ptx · on Nov 23, 2023

If you think of it not as a "rename" (which would belong in the edge object if it existed) but rather as a "note: the file A in this tree was known as B in the parent tree" it would make perfect sense to store it in the child commit.

paulddraper · on Nov 23, 2023

Git doesn't store any individual changes: files moved, lines added, line deleted, etc.

It stores a commit graph, and a tree at each of those commits. (A lossless compression algorithm deduplicates information.)

There's no need for the author to be concerned with what diffing information gets incorporated into the commit. Diffs are up to the viewer of the commit history.

  git show --diff-algorithm=...