If I have a 100-line file and on 'main' it changes near the top, but in my 'topi...

pmeunier · on Dec 18, 2020

> If I have a 100-line file and on 'main' it changes near the top, but in my 'topic' branch it changes near the bottom, then I can cherry-pick 'topic' onto 'main' and Git will resolve the diff correctly.

That is not true: sometimes Git will take the new lines from "topic" and merge them into the new lines from "main", see https://pijul.org/manual/why_pijul.html

> You might hit a conflict in your "git cherry-pick" command which gives you an opportunity to resolve the unexpected diff issue in an appropriate way, which ends up with a different diff than before.

Sometimes when you cherry-pick, you might not even hit a "true" conflict, but if you forgot to run "rerere", you might simply hit a previously solved conflict again.

recursive · on Dec 17, 2020

I kind of intuitively get it, but that doesn't really seem well defined. I'm always a little bit spooked that `cherry-pick` will cleanly apply when it really shouldn't have. It's not clear to me under which circumstances it automatically resolves.

tsimionescu · on Dec 17, 2020

You're right to be spooked about that, but you're wrong if you think only cherry-pick has this problem. In fact, all git commands can and sometimes will cleanly apply and subtly mess your files (git merge, git pull, git rebase, git apply, git stash apply etc).

The definition of how changes are applied actually has nothing to do with git itself, and everything to do with the diff algorithm you choose (of course, you normally use a built-in one, but I believe you can customize it if you really want).

In general, the default Git diff algorithm, like all text-based diff algorithms, can have problems with structured data, such as removing closed parens or significant white-space. Naturally, it can also be problematic if you have declarations that must be a unique in a file, but that can occur in different places. The Java or Go `package` statements are safe, since they must occur at the beginning of a file, so if they are different between the 2 files they are likely to be caught. But if two people have added a top-level function called `foo` , but they added it in different places in the file with different params, it's pretty likely that the diff algorithm will not see any conflict and will duplicate both lines.

Cherry pick is in fact one of the places I would normally worry least about this, since it is usually done for limited sized commits. However, when merging a feature branch into master, the potential for errors goes up, and so does the work required to catch such errors during the review.

ninkendo · on Dec 18, 2020

You don’t even need to think about diff algorithms to see why a cherry-pick, merge, etc may not do what you want.

If in my branch, I rename oldFunc to newFunc in file A, and change file B to replace the call to oldFunc with newFunc; and in your branch, you add a new call to oldFunc in file C... the code will break when we merge our branches. Our changes would both pass tests independently, but would break when we merge them. No file-level diff algorithm would detect a “conflict” here.

Diff algorithms only help with saying “are two branches trying to edit the same lines of code”, but the answer to that question is never enough to tell you whether two changes logically will apply cleanly to one another.

recursive · on Dec 17, 2020

Thank you.

In all my (extensive) commentary in this topic, I feel this is the first response that addresses the root of my confusion in a way I can understand. Sincerely, this is helpful.

tsimionescu · on Dec 17, 2020

Glad to hear that! Rarely have I actually felt that a comment I wrote actually made a difference.