Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've often thought this too, you shouldn't have to rewrite i.e. lie about what happened. Git history ideally should be completely immutable but there should be a view that tidies up what happened for those that like to see individual features/bugs/hotfixes all listed in history in a nice way.

I don't like the name thought, commit groups seems odd, I prefer feature view or something else. The reason is commit groups sounds to me like groups of people who are allowed to commit.



The ugly truth though, is that the process of programming can be shamefully bad. Often times the in-progress branch commits are the opposite of the platonic ideal commit message, for common human reasons. By 'common human reasons' I mean programming is messy, and it's often unclear why something does or does not work until after something functional is arrived at.

The important thing is that your tools, specifically git, works for you, and not the other way around, so 'bad' commit messages like 'before', followed by 'after' are totally fine while working in the branch as long as it gets cleaned up during merge. However, they are the antithesis of useful months or years down the line while playing code historian. (Hilariously, the answer to the question "what idiot wrote this code", is sometimes the reader!)

So is it a lie that sometimes after lunch on a Friday, the act of programming is often a series of off-by-one, off-by-two, off-by-one-the-other-direction compile-test-edit-commit loops? And that we'd prefer to be thought of as a genius that wrote some fundamental-to-the-company code 5-10 years later, with beautiful commit messages that live up to some platonic ideal, rather than "that one dumbass"?

It's a lie the same way that people who wear makeup are 'lying'. It's true under a very specific, weird framing, but it doesn't really agree with reality.

There are notable high profile exceptions like publicly viewable patch series against the linux kernel, but you're deluded if you think those aren't edited before being released for human consumption.


This is exactly right.

It encourages people to Commit Whenever They Want To, secure in the knowledge that people will never see my -- I mean their -- feeble intermediate attempts at working code. Committing frequently is good. It means that reflog will often have interesting stuff in it, bisecting your feature branch might have a decent chance of finding obscure bugs discovered during development, etc. etc.

It's a godsend once you truly Get It that all that ugly intermediate nonsense can be removed before merging. I suspect that people who advocate against the "rebase+rewrite" philosophy do Not Get It and work in a way where even their branch-local commits are pretty meticulous and tested, etc. Nothing against those people, but they can still work fine in an environment with rebase+rewrite being the default.

I also have literally NEVER heard a good argument for keeping the full commit history (with WIP commits, etc.). There is nothing to be learned from it unless you're specifically investigating peoples' commit habits.

ETA: Some IDEs have a Local History thing where you can basically see all versions of a local file (snapshot every 60s or whatever). Do you want that in your repository history? I don't think so.


> I also have literally NEVER heard a good argument for keeping the full commit history (with WIP commits, etc.).

Maybe their company ranks employees by number of commits?


That is still not a good argument.

I do feel for these people though. Depending on their company they could try and fight it and change it (can be possible depending on things like company size, is this being introduced or well established, your influence level w/ the deciders etc) or simply and RUN and never look back.


I think the person you replied to might be jesting :)...

... but, yes, if you find yourself in that type of situation and powerless to change it[0], move on.

[0] "Maybe give it a couple of tries. If that doesn't change anything, give up. No reason to be a damn fool about it." (Paraphrasing Churchill, I think? Anyway, not claiming originality.)


> There are notable high profile exceptions like publicly viewable patch series against the linux kernel, but you're deluded if you think those aren't edited before being released for human consumption.

As they should, and tons of them certainly have gone through draft version with "garbage" commits on temporary git branches on the computers of their author. The thing is, that kind of draft version before you even want to communicate with other humans (or at least a high number of them) is rarely useful for long term understanding of the history, so it is extremely useful to have a somehow cleaned-up history in project. That you do some cleanup just before submitting a series by email is just a detail: and even then the rewriting is arguably even more pronounced than most other project because your first version of a complex change is likely to be rejected because the maintainers want some improvements on some aspects, so you submit a second (then maybe a 3rd, etc) series and most of the time you rewrite the history in each (and not just add a new patch on top of them)

So I don't think this is even an exception, on the contrary! Git history should not change on branches that are widely used, like the branch of Linus for the kernel, or maintainers, etc. For work in progress used by a single dev or highly synchronized two or three persons, maybe even skipping some processes (e.g. code reviews) while doing it at first, I better not see that in the "clean" history of the project because 99% of the time this has very little value and extremely high noise.


> So is it a lie that sometimes after lunch on a Friday, the act of programming is often a series of off-by-one, off-by-two, off-by-one-the-other-direction compile-test-edit-commit loops?

No, exactly the opposite: it's a lie to pretend that you can write perfect code the first time, and it does no-one any favours in the long run: not yourself, and certainly not those who come to learn from you in the future. And it's not normal or expected the way makeup can sometimes be; junior programmers will be genuinely deceived and this will cause real harm.

Keep the history, warts and all. Best case someone might learn something from it. Worst case you're no worse off.


The “keep everything” approach conflicts with the “have clean commits” approach. If we keep everything, instead of “Build A; Build B; Build C” we wind up with “Build some of A, B, and C;” * 8.


If you're never rewriting history then branching and merging become cheap, so it's easy to do each of A, B and C on its own branch.


But again, like the parent comment said, if you’re never rewriting history, that means you’re stuck with how the code was actually developed, which frequently is not that clean. Many people don’t write three separate features most of the time, they write three features together and at the same time.


If the real history is that you developed all three features in an interleaved way, isn't that more useful (e.g. for bisection) than a fictional history that you haven't actually tested? Most likely your cherry-picking/rebasing won't be perfect, so you'll still have parts of A mixed in with B and C and you'll have things like one commit depending on changes from a future commit. The history of how the code was actually developed might be "messy" but it's more likely to at least compile (because presumably you were compiling it from time to time while you were developing).


> isn't that more useful (e.g. for bisection)

Why would my messy history be useful for bisection? The places I committed, it may not have even fully compiled except for the very last commit. In that case, to separate the code in a useful way (such as 3 commits, one for each feature, each of which compiles on its own), you'd have to do a bit more work and create new commits, which again means either rendering the original commits pointless or disregarding them.


Surely your test-edit cycle involves at least some compiling. Maybe not every commit will compile, but most changes that compile will have a commit. At the very least a "real" commit has a much higher chance of compiling than an "artificial" one that you constructed retrospectively.

If you really do make most of your commits not compile then I can sort of sympathise with squash-merging, but if you merge then worst case it's a one-liner to bisect while only looking at "mainline" history (i.e. only the merges to master, the equivalent of what you'd get if you'd squash-merged), whereas if you squash-merge then there's no way to bisect back through the original history.


Compiling vs. not-compiling is only the tip of the iceberg; there’s a lot of other aspects of my development that don’t make sense until the very end. Arguably, the whole feature is basically useless until it’s finished; why would I keep working on it in any meaningful way after it is complete and running? If that happens, chances are that the ticket wasn’t atomic enough. There are exceptions of course, such as substantial rewrites for bug fixes, but those are by nature not the norm. As a result, my commit messages are only for me, which saves time in development. Commit messages “wip” and “working now” mean something to me, but definitely hold no value to whoever is doing a git blame in the future, which is another benefit of squashing.


If it compiles then I can use it in an automated bisect, which is the main thing VCS history is useful for IME. I'm a big believer in "refactor mercilessly" and "make the change easy, then make the easy change", so while obviously the final feature will not be working in the intermediate commits, the work will touch on other code areas and there's always the possibility that this will introduce a subtle bug that slips past the current test suite, and if that should happen then I want to be able to bisect down to the smallest possible diff before I start trying to understand it manually. I also find that a small commit with a useless message is actually a much more useful blame result than a big commit, even if the big commit contains a detailed explanation of the overall change.


> I also find that a small commit with a useless message is actually a much more useful blame result than a big commit, even if the big commit contains a detailed explanation of the overall change.

That's pretty interesting. I know with me, that is definitely not true, because 90% of all commits would just be the message `wip` which makes Git Blame incredibly hard to use.


What are you trying to get out of the blame? I do sometimes git tag --contains to find the overall feature that the blame-output commit was part of, but most of the time the most useful thing is just to see the diff for that commit or frankly even just the list of files it touches.


Much of the time it’s asking what the motivation behind a line of code is, such as why we take some crazy convoluted approach to what seems like it should be a simple task. Editor plugins such as Git Lens display the blame output so it is much more convenient if that information is in the commit rather than in an associated tag.


Git history should not be kept immutable.

Git release branch history should be kept immutable, because this is a way to see how things were in past, for troubleshooting, for ensuring that the code under source control is actually the code you deployed, and / or the code your downstream depends on, etc.

On your feature branch, you can do anything, as long as you can later cleanly merge with the main branch from which releases are cut.

I'd say it's a completely normal practice to rebase, split, and fixup your commits on the feature branch, in order to present a clean picture to the reviewers, to easily show that a new test catches the error in the old broken code, and the new fix actually makes that test pass, etc. Nobody cares about what happens on your feature branch but you. Its commit history is not holy, it's a tool like other tools. Somebody depends on oncoming progress of my feature branch while developing their own? Well, `pull --rebase` regularly, same as you do with the main branch.

Squash that history during the merge to the main branch. Do not delete the feature branch, uncheck that checkbox in Github repo settings. Clean history representing completed features: check. Detailed explanation of development in code: check.

On the topic of the article: to my mind, "commit groups" can be sufficiently well implemented as branches, or as tags and ranges of commits between tags. For some very complicated cases, they can be implemented as actual text tags inserted into commit messages.


The whole point of using a DVCS is to be able to publish and pull from each other's branches. If your feature branches are private, or you have to check with someone before pulling from their branch (which you do if they're in the habit of rebasing), then you're missing out on most of git's value.


This is an absolutely ridiculous strawman. Just because not every branch is a shared branch doesn’t mean you aren’t getting “DVCS value”.


I've been a programmer for long enough to have used SVN seriously. It really wasn't so different - it honestly did feel much the same as when I've worked in places that used a rebase-heavy git workflow.


This both applies and does apply to the case at hand.

Let's call it differently: dabble branch and share branch. It's the share branch where you interact with others. You are not bound to have exactly one release branch, and often you don't, when you backport stuff to older releases. But this is a branch you keep in order because you share it with others.

Your dabble branch is your playground. You can do weird things, make stupid mistakes, fix them, etc. You do not share that branch with others much, except to let them see its current state. They do not depend on it, and not expect it to be nice.

When your portion of work is done, and you (maybe several of you) want to share it with other collaborators, not involved in the process of your dabbling, but interested in the result of it, you may choose to clean it up. You can reorder commits into logical spans, and meld them. You can split a commit that does two unrelated things, and describe each separately. You get rid of all the noise (if you produced any), and form a nicer picture for your collaborators to review and understand. You do it because you care about their time and sanity.

Then you merge the result of your dabbling into the share branch, squashing commits into one. This keeps the history of the shared branch(es) observable. If anybody wants to step back, they have your original dabble branch, which you now abandon and create a new one.

Dabble branches should be short-lived, a couple of days. You can have many long-lived share branches for features that take long to develop, etc. Share branch history usually does not need cleaning up, so there's usually no point to rewrite it. It allows to merge it periodically with other share branches, if any.


You should hopefully be talking to each other at least every couple of days - the real advantage of having visibility of each other's branches comes when you use them to share work on a much smaller timescale.

As for caring about your reviewers' time and sanity, rewriting commits that they may already have seen is the opposite of that IMO. Any decent review tool will let you review a single combined diff for the whole branch, and that's what a reviewer who hasn't been following your progress will use. Meanwhile if a reviewer did happen look at your branch yesterday, taking away their ability to view just the changes since then is doing them no changes. (This is especially true when it comes to applying changes from review feedback - if I requested a couple of small fixes then I want to review a commit where you made those small fixes, I don't want to have to re-review the whole PR because you rebased)


*doing them no favours. Sorry for the writing mistake.


I want to be able to preserve two parallel commit histories: one where the the commits are ordered by time, and another where the commits are ordered by 'story'. Git could cryptographically verify that the end-states of the two histories are identical, and allow me to alter the storied history at will (shifting hunks between commits, splitting/combining commits, reordering commits etc), where during merge both histories are preserved.

I don't really follow your naming critique though. "commit groups" seems like a fine name, they are groups of commits. What you describe I would call "committer groups".


What is your use case? How do you develop so that your story does not belong to one (or more) feature branches where it can be held, unmixed with other stories? Can your story continue through multiple merges to the main / release / whatever branch(es)?

I'm asking totally unironically; every company's flow may be different, and for good reasons which I'm oblivious about. So I'd gladly read if you had time to explain.


Something like --date-order to view commits chronologically vs --topo-order to view them topologically sorted?


I used to think this way. Now I think that we as programmers already suffer from too much information.

I spend considerable time these days making my commits as clear and readable as possible, and between rebasing, squashing and amending messages, it's quite likely there there might be dozens of intermediate commit IDs for every single commit ID that enters the repo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: