I agree with the author's sentiment. The way this problem is typically framed is as a dichotomy between preserving a "true" history of what really happened on the micro/commit scale VS presenting a "clean" history that makes the story of the change easy to follow on the macro/PR scale.
This is a false dichotomy. I'm greedy, I want BOTH. Give me story mode when I'm just browsing the repo, but offer me the option to switch into commit-by-commit mode when I want more detail.
For a view of commits that shows when they entered the current branch, rather than when they were authored (i.e. group all commits from the same PR together in the history):
git log --topo-order
For only viewing merge commits (as if you were using a squash-and-rebase strategy, but without having to rewrite history at merge time):
git log --merges
Assuming you are using branches as groups of commits — kind of the point of a branch — and you're using merges with --no-ff (which is the default on Github for the Merge button — https://docs.github.com/en/github/administering-a-repository... — and is necessary in this scheme to prevent fast-forward "merges" that would mess up viewing the merge history) rather than squashing and rebasing, `git log` shows you the "true" history, `git log --topo-order` shows you the chronological history of when commits entered the main branch, and `git log --merges` shows you the zoomed-out "clean" history of PR merges.
Many people don't know about these features because Github doesn't have an option to view the log that way: only git does. TBH I wish Github had offered a --topo-order and --merges selector to their commit log view before offering a myriad of PR merging/rebasing options: git provides a wealth of solutions to viewing commit history without forcing you into destroying parts of it with squash-and-rebase.
Edit: IMO, part of this issue is because of git's bad default log view. The --topo-order view is a much more useful view for reading the history of the repo (both for knowledge building and for debugging) than the default chronological-by-commit-timestamp view; typically you don't care what date a commit was authored on, as much as you care when it entered the main branch. I can't think of a single time I've actually wanted the default log view, and yet it's the default, and thus it's what Github shows.
Yes! Thankyou for that. I knew about some of the flags discussed on this topic but not --topo-order.
For my new project I've already decided some months ago to use merges with --no-ff when merging in work by other people, along with a kernel style "everyone gets their own repo" model.
A lot of the discussion around Git workflow is basically noise created by the weak and hardly ever improved GitHub UI. True. They prefer building AI to building better log views on their core product. But, even much more advanced Git clients like the one in IntelliJ don't properly expose all the different ways of looking at git logs, and git has so many flags and options that it's nearly impossible to know them all. Even if one day you spend the time to read the user guide from cover to cover, newer versions will add new features and you won't see them. If the GUIs were better, I suspect all these discussions would quickly dry up.
I've often thought this too, you shouldn't have to rewrite i.e. lie about what happened. Git history ideally should be completely immutable but there should be a view that tidies up what happened for those that like to see individual features/bugs/hotfixes all listed in history in a nice way.
I don't like the name thought, commit groups seems odd, I prefer feature view or something else. The reason is commit groups sounds to me like groups of people who are allowed to commit.
The ugly truth though, is that the process of programming can be shamefully bad. Often times the in-progress branch commits are the opposite of the platonic ideal commit message, for common human reasons. By 'common human reasons' I mean programming is messy, and it's often unclear why something does or does not work until after something functional is arrived at.
The important thing is that your tools, specifically git, works for you, and not the other way around, so 'bad' commit messages like 'before', followed by 'after' are totally fine while working in the branch as long as it gets cleaned up during merge. However, they are the antithesis of useful months or years down the line while playing code historian. (Hilariously, the answer to the question "what idiot wrote this code", is sometimes the reader!)
So is it a lie that sometimes after lunch on a Friday, the act of programming is often a series of off-by-one, off-by-two, off-by-one-the-other-direction compile-test-edit-commit loops? And that we'd prefer to be thought of as a genius that wrote some fundamental-to-the-company code 5-10 years later, with beautiful commit messages that live up to some platonic ideal, rather than "that one dumbass"?
It's a lie the same way that people who wear makeup are 'lying'. It's true under a very specific, weird framing, but it doesn't really agree with reality.
There are notable high profile exceptions like publicly viewable patch series against the linux kernel, but you're deluded if you think those aren't edited before being released for human consumption.
It encourages people to Commit Whenever They Want To, secure in the knowledge that people will never see my -- I mean their -- feeble intermediate attempts at working code. Committing frequently is good. It means that reflog will often have interesting stuff in it, bisecting your feature branch might have a decent chance of finding obscure bugs discovered during development, etc. etc.
It's a godsend once you truly Get It that all that ugly intermediate nonsense can be removed before merging. I suspect that people who advocate against the "rebase+rewrite" philosophy do Not Get It and work in a way where even their branch-local commits are pretty meticulous and tested, etc. Nothing against those people, but they can still work fine in an environment with rebase+rewrite being the default.
I also have literally NEVER heard a good argument for keeping the full commit history (with WIP commits, etc.).
There is nothing to be learned from it unless you're specifically investigating peoples' commit habits.
ETA: Some IDEs have a Local History thing where you can basically see all versions of a local file (snapshot every 60s or whatever). Do you want that in your repository history? I don't think so.
I do feel for these people though. Depending on their company they could try and fight it and change it (can be possible depending on things like company size, is this being introduced or well established, your influence level w/ the deciders etc) or simply and RUN and never look back.
I think the person you replied to might be jesting :)...
... but, yes, if you find yourself in that type of situation and powerless to change it[0], move on.
[0] "Maybe give it a couple of tries. If that doesn't change anything, give up. No reason to be a damn fool about it." (Paraphrasing Churchill, I think? Anyway, not claiming originality.)
> There are notable high profile exceptions like publicly viewable patch series against the linux kernel, but you're deluded if you think those aren't edited before being released for human consumption.
As they should, and tons of them certainly have gone through draft version with "garbage" commits on temporary git branches on the computers of their author. The thing is, that kind of draft version before you even want to communicate with other humans (or at least a high number of them) is rarely useful for long term understanding of the history, so it is extremely useful to have a somehow cleaned-up history in project. That you do some cleanup just before submitting a series by email is just a detail: and even then the rewriting is arguably even more pronounced than most other project because your first version of a complex change is likely to be rejected because the maintainers want some improvements on some aspects, so you submit a second (then maybe a 3rd, etc) series and most of the time you rewrite the history in each (and not just add a new patch on top of them)
So I don't think this is even an exception, on the contrary! Git history should not change on branches that are widely used, like the branch of Linus for the kernel, or maintainers, etc. For work in progress used by a single dev or highly synchronized two or three persons, maybe even skipping some processes (e.g. code reviews) while doing it at first, I better not see that in the "clean" history of the project because 99% of the time this has very little value and extremely high noise.
> So is it a lie that sometimes after lunch on a Friday, the act of programming is often a series of off-by-one, off-by-two, off-by-one-the-other-direction compile-test-edit-commit loops?
No, exactly the opposite: it's a lie to pretend that you can write perfect code the first time, and it does no-one any favours in the long run: not yourself, and certainly not those who come to learn from you in the future. And it's not normal or expected the way makeup can sometimes be; junior programmers will be genuinely deceived and this will cause real harm.
Keep the history, warts and all. Best case someone might learn something from it. Worst case you're no worse off.
The “keep everything” approach conflicts with the “have clean commits” approach. If we keep everything, instead of “Build A; Build B; Build C” we wind up with “Build some of A, B, and C;” * 8.
But again, like the parent comment said, if you’re never rewriting history, that means you’re stuck with how the code was actually developed, which frequently is not that clean. Many people don’t write three separate features most of the time, they write three features together and at the same time.
If the real history is that you developed all three features in an interleaved way, isn't that more useful (e.g. for bisection) than a fictional history that you haven't actually tested? Most likely your cherry-picking/rebasing won't be perfect, so you'll still have parts of A mixed in with B and C and you'll have things like one commit depending on changes from a future commit. The history of how the code was actually developed might be "messy" but it's more likely to at least compile (because presumably you were compiling it from time to time while you were developing).
Why would my messy history be useful for bisection? The places I committed, it may not have even fully compiled except for the very last commit. In that case, to separate the code in a useful way (such as 3 commits, one for each feature, each of which compiles on its own), you'd have to do a bit more work and create new commits, which again means either rendering the original commits pointless or disregarding them.
Surely your test-edit cycle involves at least some compiling. Maybe not every commit will compile, but most changes that compile will have a commit. At the very least a "real" commit has a much higher chance of compiling than an "artificial" one that you constructed retrospectively.
If you really do make most of your commits not compile then I can sort of sympathise with squash-merging, but if you merge then worst case it's a one-liner to bisect while only looking at "mainline" history (i.e. only the merges to master, the equivalent of what you'd get if you'd squash-merged), whereas if you squash-merge then there's no way to bisect back through the original history.
Compiling vs. not-compiling is only the tip of the iceberg; there’s a lot of other aspects of my development that don’t make sense until the very end. Arguably, the whole feature is basically useless until it’s finished; why would I keep working on it in any meaningful way after it is complete and running? If that happens, chances are that the ticket wasn’t atomic enough. There are exceptions of course, such as substantial rewrites for bug fixes, but those are by nature not the norm. As a result, my commit messages are only for me, which saves time in development. Commit messages “wip” and “working now” mean something to me, but definitely hold no value to whoever is doing a git blame in the future, which is another benefit of squashing.
If it compiles then I can use it in an automated bisect, which is the main thing VCS history is useful for IME. I'm a big believer in "refactor mercilessly" and "make the change easy, then make the easy change", so while obviously the final feature will not be working in the intermediate commits, the work will touch on other code areas and there's always the possibility that this will introduce a subtle bug that slips past the current test suite, and if that should happen then I want to be able to bisect down to the smallest possible diff before I start trying to understand it manually. I also find that a small commit with a useless message is actually a much more useful blame result than a big commit, even if the big commit contains a detailed explanation of the overall change.
> I also find that a small commit with a useless message is actually a much more useful blame result than a big commit, even if the big commit contains a detailed explanation of the overall change.
That's pretty interesting. I know with me, that is definitely not true, because 90% of all commits would just be the message `wip` which makes Git Blame incredibly hard to use.
What are you trying to get out of the blame? I do sometimes git tag --contains to find the overall feature that the blame-output commit was part of, but most of the time the most useful thing is just to see the diff for that commit or frankly even just the list of files it touches.
Much of the time it’s asking what the motivation behind a line of code is, such as why we take some crazy convoluted approach to what seems like it should be a simple task. Editor plugins such as Git Lens display the blame output so it is much more convenient if that information is in the commit rather than in an associated tag.
Git release branch history should be kept immutable, because this is a way to see how things were in past, for troubleshooting, for ensuring that the code under source control is actually the code you deployed, and / or the code your downstream depends on, etc.
On your feature branch, you can do anything, as long as you can later cleanly merge with the main branch from which releases are cut.
I'd say it's a completely normal practice to rebase, split, and fixup your commits on the feature branch, in order to present a clean picture to the reviewers, to easily show that a new test catches the error in the old broken code, and the new fix actually makes that test pass, etc. Nobody cares about what happens on your feature branch but you. Its commit history is not holy, it's a tool like other tools. Somebody depends on oncoming progress of my feature branch while developing their own? Well, `pull --rebase` regularly, same as you do with the main branch.
Squash that history during the merge to the main branch. Do not delete the feature branch, uncheck that checkbox in Github repo settings. Clean history representing completed features: check. Detailed explanation of development in code: check.
On the topic of the article: to my mind, "commit groups" can be sufficiently well implemented as branches, or as tags and ranges of commits between tags. For some very complicated cases, they can be implemented as actual text tags inserted into commit messages.
The whole point of using a DVCS is to be able to publish and pull from each other's branches. If your feature branches are private, or you have to check with someone before pulling from their branch (which you do if they're in the habit of rebasing), then you're missing out on most of git's value.
I've been a programmer for long enough to have used SVN seriously. It really wasn't so different - it honestly did feel much the same as when I've worked in places that used a rebase-heavy git workflow.
This both applies and does apply to the case at hand.
Let's call it differently: dabble branch and share branch. It's the share branch where you interact with others. You are not bound to have exactly one release branch, and often you don't, when you backport stuff to older releases. But this is a branch you keep in order because you share it with others.
Your dabble branch is your playground. You can do weird things, make stupid mistakes, fix them, etc. You do not share that branch with others much, except to let them see its current state. They do not depend on it, and not expect it to be nice.
When your portion of work is done, and you (maybe several of you) want to share it with other collaborators, not involved in the process of your dabbling, but interested in the result of it, you may choose to clean it up. You can reorder commits into logical spans, and meld them. You can split a commit that does two unrelated things, and describe each separately. You get rid of all the noise (if you produced any), and form a nicer picture for your collaborators to review and understand. You do it because you care about their time and sanity.
Then you merge the result of your dabbling into the share branch, squashing commits into one. This keeps the history of the shared branch(es) observable. If anybody wants to step back, they have your original dabble branch, which you now abandon and create a new one.
Dabble branches should be short-lived, a couple of days. You can have many long-lived share branches for features that take long to develop, etc. Share branch history usually does not need cleaning up, so there's usually no point to rewrite it. It allows to merge it periodically with other share branches, if any.
You should hopefully be talking to each other at least every couple of days - the real advantage of having visibility of each other's branches comes when you use them to share work on a much smaller timescale.
As for caring about your reviewers' time and sanity, rewriting commits that they may already have seen is the opposite of that IMO. Any decent review tool will let you review a single combined diff for the whole branch, and that's what a reviewer who hasn't been following your progress will use. Meanwhile if a reviewer did happen look at your branch yesterday, taking away their ability to view just the changes since then is doing them no changes. (This is especially true when it comes to applying changes from review feedback - if I requested a couple of small fixes then I want to review a commit where you made those small fixes, I don't want to have to re-review the whole PR because you rebased)
I want to be able to preserve two parallel commit histories: one where the the commits are ordered by time, and another where the commits are ordered by 'story'. Git could cryptographically verify that the end-states of the two histories are identical, and allow me to alter the storied history at will (shifting hunks between commits, splitting/combining commits, reordering commits etc), where during merge both histories are preserved.
I don't really follow your naming critique though. "commit groups" seems like a fine name, they are groups of commits. What you describe I would call "committer groups".
What is your use case? How do you develop so that your story does not belong to one (or more) feature branches where it can be held, unmixed with other stories? Can your story continue through multiple merges to the main / release / whatever branch(es)?
I'm asking totally unironically; every company's flow may be different, and for good reasons which I'm oblivious about. So I'd gladly read if you had time to explain.
I used to think this way. Now I think that we as programmers already suffer from too much information.
I spend considerable time these days making my commits as clear and readable as possible, and between rebasing, squashing and amending messages, it's quite likely there there might be dozens of intermediate commit IDs for every single commit ID that enters the repo.
It's pretty easy though. Leave all the commits. Put meaningful messages in your merge commits, and your clean macro history is just merge commits and the messy one is the rest.
Merge commits prevent having clean history, because they lie to you. If you "git show" a merge commit, it hides almost all changes because they've been automatically merged, but it doesn't follow that they were merged correctly. This can cause things like gotofail.
I think the scenario you've described assumes that the branches have diverged -- that each branch has a commit not on the other. But if you always rebase your branch onto `develop` before merging, your branch definitively has no conflicts. If you then use `--no-ff` to merge, it will create a merge commit anyway.
This article shows what I mean: https://euroquis.nl/blabla/2019/08/09/git-alligator.html . (The only thing missing is the rebase before merge, since the example image shows a case where a merge could have hidden some changes.)
If you do that then, the diff in git show for the merge commit will always be empty. But I think it's a solvable problem, there just needs to be a better/easier way to say, show the diff between the one side of the merge (usually the first/left) and the result of the merge when showing a merge commit. Or maybe there is an option I'm not familiar with.
That said the rebase followed by --no-ff merge is my preferred approach as well.
If [commit] is the merge commit for the feature, then this should work:
git show [commit]^1..[commit]
The ^1 incantation means "first parent", so in the feature branch style I'm describing, this would show the changes on all of the commits for the feature merged at [commit]. You can use `git diff` instead of `show` in the same way, if you just want the cumulative diff.
It seems to work on the repository I've been contributing to, as it shows each of the commits that were part of the feature branch. The merge commit itself is empty, as desired, avoiding the up-thread concern of unreviewed auto-merged code.
I want a clean merge history I can run git bisect against. The utility of git bisect is much diminished when the commit history is full of broken commits. It works best when all the commits successfully run the test suite at the time, which is perhaps not the highest standard of "clean commit" we could ask for but my impression is that it puts it fairly high vs. the "commit everything" criterion.
It doesn't matter what git supports if I'm bisecting into history where the tests don't run cleanly on every commit. Now I don't know whether the test case I'm bisecting with is broken because it's revealing the bug or if it's broken because the commit is broken. git bisect is literally mathematically useless if you get even a single --good or --bad wrong.
All of these arguments are of the flavor “git supports —-flag-1%-of-users-know so that works.” But there’s a nontrivial cost to teaching all of your users a non default workflow, which you have to pay for every new person that comes along.
I use merge commits as the “clean” history and non-merge commits as the “true” history. Are there problems that this approach doesn’t solve? My only gripe is that apps like GitHub don’t give me the option to display commits how I want, but that’s a problem with inflexible tooling in general.
Couldn't you get this with squash and merge, along with not deleting merged branches? The main branch would have the story, and you could go into the actual squashed branch to get the actual commit history?
That creates a mess. It's not clear which remote branches are still active, and can be difficult to link the branches back to the merges historically for hunting bugs.
The best way I've seen by far: Prepare a fast-forward merge, then merge it with --no-ff. You end up with a linear history of commits grouped by the merge commits, can see either view in git log using --first-parent or not, and bisect can find the actual commit when needed.
Does this have some kind of distinct result from rebasing the branch before merging does? I'm not thinking of anything that would be different, so I'm not sure if using the more obscure command (`git merge --no-commit` vs `git rebase`) is for a specific reason.
Did you mean to respond to someone else? I never mentioned --no-commit, I'm talking about rebasing if needed - set up a fast-forward merge, which may or may not need a rebase.
No, I've just never heard rebasing a branch described as "setting up a fast-forward merge." It implies doing something very different to me from a straightforward rebase, though with this elaboration I can see how you could describe it that way.
This is a false dichotomy. I'm greedy, I want BOTH. Give me story mode when I'm just browsing the repo, but offer me the option to switch into commit-by-commit mode when I want more detail.