4. the commit will be visible at the original repo URL with your SHA (https://github.com/torvalds/linux/commit/<SHA>), with no indication whatsoever that this is coming from a different repo
I have reported this problem to GitHub a while ago and they replied to me that this is a well known feature of the repo "network".
This is actually an optimization done by GitHub. It would take up a lot of space if GitHub copied the entire repo every time someone forked it, so they keep all the commits in the original repo. As a side effect, commits in forks are accessible from the original repo since commits from both repos are stored in the same place.
GP isn't saying GitHub should copy the entire repo, only that there should be some indication that the code you're looking at isn't the repo owner's (despite being committed in their name and on a repo they "control").
I don't see what optimization requires that. They already keep track of e.g. me pushing up someone else's commit after a rebase -- it indicates that I pushed but the commit originally came from someone else.
Even if you're deduplicating commits / data internally, you can tell that that commit is not present in that repo as it is not an ancestor to any ref in that repository.
(You might argue that determining what refs contain a commit is potentially expensive, perhaps, but GitHub already does this, so I'd argue that it's not that expensive.)
Then it should be possible to store the originating repo along with the commit so that commits aren't visible in a given "repo" until they are pushed or pulled into that repo
It's sort of the reverse: they can't know, from a bare commit ID, what repo it "belongs" to without searching backward from every tag or branch in the repo. (Even that question is malformed: repos have histories and may have contained commits in the past that are no longer ancestors of existing branches or tags).
So they just fake it: they look in their database to find any commit with that SHA and put it up. And that database happens (for obvious performance reasons) to be shared between a repo and its forks.
A branch is just a series of commits; if any one of the commits has a different hash (as this hack will do) then the commit and all following commits will have a different hash.
I've never really understood Torvalds' reason for not cryptographiclly signing commits.
> Btw, there's a final reason, and probably the really real one. Signing each commit is totally stupid. It just means that you automate it, and you make the signature worth less. It also doesn't add any real value, since the way the git DAG-chain of SHA1's work, you only ever need _one_ signature to make all the commits reachable from that one be effectively covered by that one. So signing each commit is simply missing the point.
Because each commit is in a cryptographically secure chain, when you sign a Git tag it vouches for the referenced commit and all the commits preceding it. This can be done at important moments such as each release.
Cryptographically signing the commits makes rebasing impossible (or at least more difficult).
In some cases the rebase is very clean, and none of the modified files had changed by other commits. I guess in this case, git can have a rule to keep a "link" to the old commit and accept the old signature as a signature of the new commit.
In some cases there are trivial changes, like indentation because someone else added an `if` around the code you are modifying. Sometimes part of the problem has been fixed. Sometimes one of the functions you use has an additional parameter. Sometimes the code has been moved to another file. In this cases it is difficult to automatically detect if the new rebased commit is equal enough to the old commit to accept the new signature.
We can go into the big rebase/merge debate. Linus is in the rebase camp.
That's because you are not supposed to rebase other people's code on top of a changed base. That can effectively modify the behaviour of their code change. So it's good that the resulting commit won't be signed anymore.
And if you are rebasing your own code, then you can sign it again.
Note that the changes are identical, just add ` &&ret` twice, but the line numbers have changed. Also, the cherrypicked version has an additional `Signed-off-by: `.
Oh, I've had that at one of the place I used to work at. The git commit tree is signed, and once a team member left no one can create branches any more because all of his commits are now insecure.
It's funny but it doesn't actually work because the hash of the modified commit and all subsequent ones will necessarily change, right? So it would be very visible to everyone that a change has been made as it might break a lot of things to force-push an incompatible history of commit.
This might have a serious use. I have a private repo that I worked on with my daughter. If I open source it, ideally I'd keep the chronological history but scrub her email address out of it. It's OK that all the commit hashes would change. Would I want to adapt this joke tool to that purpose, or is there an exiting tool for rewriting history that way?
Seconded. The git blame-someone-else tool is just using git rebase and git commit --amend internally to alter one specified commit.
git filter-branch is perfect for this kind of wholesale revision. filter-branch is essential for tasks like: open-sourcing repos that need some kind of cleanup, massaging repos generated by a VCS migration tool, etc. For example, years ago I participated in the move of a large CVS repo to git; there was significant filter-branch post-processing required to create an acceptable baseline)
I think Torvald’s stance is reasonable when considering a customer’s safety as guaranteed by an organization. E.g. this build is signed as safe.
Commit signatures are useful in large organizations designed to worry about insider threats. If code that is reckless or malicious is found in a build, you want repudiation of the author. Lack of commit signatures allows a malicious actor to cover their tracks.
And also, we should accept that we don’t treat all authors with the same scrutiny. Veterans’ code gets scrutinized less, so let’s actually trust that they’re the real author before signing a tag with their code.
I've had a coworker, "Tom", who was terrible with three way merges (why is it the people awful at merges want to do the most merges by insisting on feature branches for their code?)
I'm still not sure what he was doing but some of his merges ended up with the wrong name next to code. We started figuring this out about him when "George" was getting dressed down for a bug he introduced.
Two things drew me into this. First, I was getting tired of things being blamed on George. Everybody in this group had issues, nobody should have been pointing fingers at anybody else, especially this guy or his partner in crime, Tom. But equally important to me at that moment was that I was the primary on that code review, so now it's on me too.
A lot of code I look at becomes a bit of a blur, but I remembered this block of code particularly well, because it was the sort of tricky code that George sometimes cocks up but bless him if he didn't get it right on the first try. Only the code we were upset about wasn't the code I reviewed. His name was on it. The commit sequence lined up. What the hell.
An excruciatingly long git bisect later (git bisect is not built for some things, this included) and I track it down to a bad three way merge by Tom. He ended up with some bastardized version of left and right that had its own set of bugs, and George's name on the commit. I hadn't known you could do that with Git. It was quite upsetting.
Do you have any more information of any kind on this (like info you have run into since then)? This sounds very interesting and it also sounds like something I should be aware is possible to do (especially on accident).
This makes sense,all organizations are different and it is true that all changes to the kernel tree are publicly ACK-ed before geetting committed.
Maybe we could make a note of the public key that pushed each commit to the repo so we get the best of both ways, each commit is associated to a user from it's public key, not just the Author field and tags are signed by GPG.
Personally, I find this really useful when I accidentally squash something incorrectly during a rebase and in the process of cleaning it up end up with changes attributed to the “wrong” person.
Obligatory self-promotion of my opposite joke project, git-upstage, which steals credit for someone else's work. (Squashes their branch to a single commit under your name and backdates it five minutes.)
I really dislike the term chosen for this feature. “Blame”, assumes the code is broken or written improperly in some way. Most of the time I use it I’m just trying to find out who wrote it so I can find the original commit to understand it in more context.
AFAIK original was "cvs annotate" [1]. Subversion introduced "svn blame" and "svn praise" aliases for "svn annotate" as some kind of joke. It's funny that git only has "git blame".
1. clone https://github.com/torvalds/linux into https://github.com/<YOURNAME>/linux.
2. push a fake "torvalds" commit into your repo.
3. check the SHA of the the commit that you made.
4. the commit will be visible at the original repo URL with your SHA (https://github.com/torvalds/linux/commit/<SHA>), with no indication whatsoever that this is coming from a different repo
I have reported this problem to GitHub a while ago and they replied to me that this is a well known feature of the repo "network".