This seems to be one of the major divisions in SCM users, some see it as a clean up for seeing a understandable 'history', others see it as a falsification.
> some see it as a clean up for seeing a understandable 'history', others see it as a falsification.
I don't see it as a falsification. I see it as making the history less understandable. Having a track of what happened, can help in understanding what was intended. Code archeology regularly helps me understand Chesterton's fence.
Programmers are human. They make mistakes. The thing is not perfect at the time of squashing. When the inevitable issue crops up in production, I really want all the information I can get.
If you start cleaning stuff up, not only do you deny me information about how the code evolved, you also introduce another point at which a flawed understanding can mislead those that come after you.
The issue is most of my commits have little meaning and often include in progress commits where code is not even functional/tests may crash. Especially as I have tooling that builds artifacts for me based on commits/runs ci based on commits. Often I’ll commit something just to see the result from ci running it or to build artifact for a small test deployment fully expecting it to not work yet. After I’m done I’ll collapse all commits for a pr into one for the actual thing the pr aimed to do and aim to keep collapsed commit around <300 lines.
That is ok. A proficient code reader will see that you like to commit -m’typo’ and -m’testing CI has to go thru git; try this now’ and understand your life and save the detailed nitpicking for the overall diff, not each commit.
It is super nice to see if it was 20 code commits and then one test commit, or back and forth between tests and code, or a new test and then code commits.
The main problem is when the programmer who created the squashed commit is no longer around.
That subtle bug they introduced in that squashed commit now has to be picked apart without any context. If I have 10,000 tiny commits (and it's never quite that bad because programmers are lazy gits), I can reconstruct come of the context of how that bug got into the codebase and what was going through their head when it occurred.
The other problem with squashed commits is that nobody will ever agree on what the correct granularity of a squash should be. Even if you give me the ability to squash, I won't. Someone else will squash 1,000 lines of changes, and I'll want to scream.
And, to be fair, even Linux doesn't really like squashed commits at the individual level. Try feeding one of those big squashed patches into most maintainers. They'll tell you to GTFO until you bust that apart.
If I see some behaviour that changed in a commit that says "lint fixes", I can be pretty sure you didn't intend that. I'm still going to check, but if I don't find anything to give me pause, I'm confident in reversing it.
If I see it in a commit that says "fix edge case", I'm going to check, double check and triple check if that edge case is still resolved after my fix.
The "oops" and "work in progress" things hold no information, but since that's the default, well... Too bad.
A cute little paragraph is almost certainly going to leave out details I need. I'll probably ignore it, because it's either unnecessary verbiage, or inane. Probably both.
Again, humans are flawed. We have code that is probably broken. There's definitely a chance that the test is broken or incomplete too.
Also, I demand nothing, not even a certain commit message. What I ask is that you don't expend effort to destroy information.
I understand the desire to hide one's flaws. To hide the thought process to make yourself look better. But it's not necessary. Everybody has flaws and it's far better to have the thinking in the open, so we can see and work with it.
Quite the opposite, this is the best feature of Git. Commit often, don't be afraid to experiment since you can always roll back easily even if it's literally just a single comma - and then when it's time to share that code with others, then reorganize history into a sequence of well-named commits with clean description of what each of them does.
We don't expect people to give important speeches without writing a draft first, nor do we insist on seeing all their drafts. Why is this any different?
It's different for the same reason math teachers want you to "show your work" in addition to simply writing down the answer to a problem. They way you as a human have approached the problem and worked it out is valuable information in both the review of the work itself and in understanding what has gone wrong when something has gone wrong.
Think of your branch as being one long multi-day math problem. If I'm grading your work, I don't want you to show me all the parts you think are neat and tidy and important after you arrive at what you think is the answer. I want to see everything you tried, even the stuff that didn't work.
I'm not opposed to only having merge commits on master, but somewhere, on some branch which is recorded for all of time, I want to be able to see every decision that was made to bring HEAD to what it is right now, on the most granular level possible.
Yes, a maths teacher wants to see the working to a problem, but the working can still be the second draft, written neatly and well explained. For a complicated problem, it should not be expected that someone will read through all the “scratch work”.
> For a complicated problem, it should not be expected that someone will read through all the “scratch work”.
Do people really read through the changelog commit by commit? What's gained by that?
I don't read through it at all. I zoom into a point that I need to know more about. I have information (bug report, runtime behaviour on other data) that allows me to zoom into a specific part. The information I have is from the committer's future. It's highly unlikely that the details needed are in the summary that they wrote.
I maintain both branches - a WIP branch and a final rebased result. I agree with the age old concept that each commit on the master should be a full self-sufficient feature. I also like to keep the messy WIP for reference. How does the 'falsification' argument apply here?
I disagree. Every commit should be an atomic change that doesn't break the build including both compilation and it running properly. Doing what you are suggesting makes bisects near impossible, wastes extra time in code review, and wastes the time of people looking at the history because they have to comb through half baked ideas which they do not know if they actually work or were tested.
Falsifying history isn't wrong in itself. Falsifying a shared history is, because it breaks other copies of the repo. So long as a branch exists only in a single repo, then the history of that branch can be rewritten without issue. But a shared branch should never be rewritten, because you'd need to copy the alterations into all copies of the repo.
I take issue with tools that prioritize rebasing, because they usually make this distinction between rebasing a local branch and rebasing a shared branch. For example, GitHub's "Squash and Merge" or "Rebase and Merge" options break the shared history between the remote copy of the repo and my local copy. After such an operation, I cannot merge from main into my development branch, and must instead rebase my development branch onto main.
To be fair, once you're developing on a project with enough commits/day, you no longer have a real choice if you want an overall linear history. You will have to cherry-pick to main somehow.
Some people want to rather merge instead, but in my experience few people have the discipline to develop in large enough PR granularity to make the resulting merge commits meaningful and not polluting history while also keeping the non-merge commits meaningful by doing rebases while the change is still in review.
This seems to be one of the major divisions in SCM users, some see it as a clean up for seeing a understandable 'history', others see it as a falsification.