Git Blame-Someone-Else

warpech · on Sept 20, 2019

What is more, you can:

1. clone https://github.com/torvalds/linux into https://github.com/<YOURNAME>/linux.

2. push a fake "torvalds" commit into your repo.

3. check the SHA of the the commit that you made.

4. the commit will be visible at the original repo URL with your SHA (https://github.com/torvalds/linux/commit/<SHA>), with no indication whatsoever that this is coming from a different repo

I have reported this problem to GitHub a while ago and they replied to me that this is a well known feature of the repo "network".

smitop · on Sept 20, 2019

> the repo "network"

This is actually an optimization done by GitHub. It would take up a lot of space if GitHub copied the entire repo every time someone forked it, so they keep all the commits in the original repo. As a side effect, commits in forks are accessible from the original repo since commits from both repos are stored in the same place.

SilasX · on Sept 20, 2019

GP isn't saying GitHub should copy the entire repo, only that there should be some indication that the code you're looking at isn't the repo owner's (despite being committed in their name and on a repo they "control").

I don't see what optimization requires that. They already keep track of e.g. me pushing up someone else's commit after a rebase -- it indicates that I pushed but the commit originally came from someone else.

orf · on Sept 20, 2019

> I don't see what optimization requires that.

From a single commit ID you cannot tell which repo it came from. A "repo" is just a tree of commits.

deathanatos · on Sept 20, 2019

Even if you're deduplicating commits / data internally, you can tell that that commit is not present in that repo as it is not an ancestor to any ref in that repository.

(You might argue that determining what refs contain a commit is potentially expensive, perhaps, but GitHub already does this, so I'd argue that it's not that expensive.)

nitrogen · on Sept 20, 2019

Then it should be possible to store the originating repo along with the commit so that commits aren't visible in a given "repo" until they are pushed or pulled into that repo

orf · on Sept 20, 2019

Storing all commit origins at the scale of github is the expensive part.

SilasX · on Sept 20, 2019

Good thing Github is allowed to associate data with a commit ID, like they already do with rebased commits, as noted in the subsequent sentence.

heftig · on Sept 20, 2019

This information is part of the commit created by git and not "associated data" added by GitHub.

SilasX · on Sept 20, 2019

Really? Where is git storing it? I don't see any information about my rebases in the message, or who pushed it.

y4mi · on Sept 21, 2019

github doesnt use plain bare repositories for their repo hosting, so they can do whatever they deem useful :)

if they did you'd be spot on though

argd678 · on Sept 20, 2019

How do they know which branch in my fork is mine vs upstream? Or in the case where I modify a forked branch?

ajross · on Sept 20, 2019

It's sort of the reverse: they can't know, from a bare commit ID, what repo it "belongs" to without searching backward from every tag or branch in the repo. (Even that question is malformed: repos have histories and may have contained commits in the past that are no longer ancestors of existing branches or tags).

So they just fake it: they look in their database to find any commit with that SHA and put it up. And that database happens (for obvious performance reasons) to be shared between a repo and its forks.

emmelaich · on Sept 20, 2019

A branch is just a series of commits; if any one of the commits has a different hash (as this hack will do) then the commit and all following commits will have a different hash.

Including the id of the branch (the HEAD).

tsm · on Sept 20, 2019

It's simpler than that: a branch is just a pointer to one specific commit (with a specific SHA)

emmelaich · on Sept 21, 2019

True, but it's both.

Just as a link in a linked list is often the list and the node in the list.

odyssey7 · on Sept 20, 2019

I'd imagine this is why GitHub disallows private forks?

heftig · on Sept 20, 2019

Yes. If they did this, private commits might even leak into packfiles fetched from GitHub by git.

jimktrains2 · on Sept 20, 2019

I've never really understood Torvalds' reason for not cryptographiclly signing commits.

> Btw, there's a final reason, and probably the really real one. Signing each commit is totally stupid. It just means that you automate it, and you make the signature worth less. It also doesn't add any real value, since the way the git DAG-chain of SHA1's work, you only ever need _one_ signature to make all the commits reachable from that one be effectively covered by that one. So signing each commit is simply missing the point.

http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-t...

tux1968 · on Sept 20, 2019

Because each commit is in a cryptographically secure chain, when you sign a Git tag it vouches for the referenced commit and all the commits preceding it. This can be done at important moments such as each release.

jimktrains2 · on Sept 20, 2019

Sure, but in the case presented by great-grand-parent is a leaf commit with an unknown providence.

At the very least I don't think it's "totally stupid", even if I know it's not a panacea for all ills.

Roritharr · on Sept 20, 2019

I guess he's of the school that it doesn't matter who commits the code, it needs to be checked anyway, for bugs or being malicious.

gus_massa · on Sept 20, 2019

Cryptographically signing the commits makes rebasing impossible (or at least more difficult).

In some cases the rebase is very clean, and none of the modified files had changed by other commits. I guess in this case, git can have a rule to keep a "link" to the old commit and accept the old signature as a signature of the new commit.

In some cases there are trivial changes, like indentation because someone else added an `if` around the code you are modifying. Sometimes part of the problem has been fixed. Sometimes one of the functions you use has an additional parameter. Sometimes the code has been moved to another file. In this cases it is difficult to automatically detect if the new rebased commit is equal enough to the old commit to accept the new signature.

We can go into the big rebase/merge debate. Linus is in the rebase camp.

mh8h · on Sept 20, 2019

That's because you are not supposed to rebase other people's code on top of a changed base. That can effectively modify the behaviour of their code change. So it's good that the resulting commit won't be signed anymore. And if you are rebasing your own code, then you can sign it again.

gus_massa · on Sept 20, 2019

What about cherrypicking bug fixes to old versions?

jimktrains2 · on Sept 20, 2019

Cherry picking creates a new commit.

gus_massa · on Sept 20, 2019

The cherry picked comment usually has the same author and date that the original commit. (Note that rebasing also creates a new commit.)

One of the latest commits backported to Linux 4.9.something https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

Cherrypicked from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

Note that the changes are identical, just add ` &&ret` twice, but the line numbers have changed. Also, the cherrypicked version has an additional `Signed-off-by: `.

utopian3 · on Sept 20, 2019

Unless I am missing something, his point seems to be different. He doesn’t seem to care about the non-repudiation of a user

Aperocky · on Sept 21, 2019

Oh, I've had that at one of the place I used to work at. The git commit tree is signed, and once a team member left no one can create branches any more because all of his commits are now insecure.

Yeah that was fun.

microcolonel · on Sept 20, 2019

Yeah, they would otherwise have to do a bit of work to keep separate maps of the objects in each fork.

p4bl0 · on Sept 20, 2019

It's funny but it doesn't actually work because the hash of the modified commit and all subsequent ones will necessarily change, right? So it would be very visible to everyone that a change has been made as it might break a lot of things to force-push an incompatible history of commit.

alex_duf · on Sept 20, 2019

The following hashes do changes (you can see that on the gif on their readme). I think you're meant to force push after using the tool

kbumsik · on Sept 20, 2019

> it doesn't actually work because the hash of the modified commit and all subsequent ones will necessarily change, right?

True, so it is just a joke.

dcminter · on Sept 20, 2019

Although your point is perfectly valid, I don't think the tool is intended for anything other than fun.

brlewis · on Sept 20, 2019

This might have a serious use. I have a private repo that I worked on with my daughter. If I open source it, ideally I'd keep the chronological history but scrub her email address out of it. It's OK that all the commit hashes would change. Would I want to adapt this joke tool to that purpose, or is there an exiting tool for rewriting history that way?

klodolph · on Sept 20, 2019

The standard way to do that is with 'git filter-branch'. If your daughter’s email is megatron@example.com,

    git filter-branch --env-filter '
    old_email=megatron@example.com
    new_email=redacted
    if [ "$GIT_COMMITTER_EMAIL" = "$old_email" ] ; then
        export GIT_COMMITTER_EMAIL="$new_email"
    fi
    if [ "$GIT_AUTHOR_EMAIL" = "$old_email" ] ; then
        export GIT_AUTHOR_EMAIL="$new_email"
    fi
    ' -- --all

This is “safe” in the sense that you can go back to the old version with the reflog if you screw things up.

saidajigumi · on Sept 20, 2019

Seconded. The git blame-someone-else tool is just using git rebase and git commit --amend internally to alter one specified commit.

git filter-branch is perfect for this kind of wholesale revision. filter-branch is essential for tasks like: open-sourcing repos that need some kind of cleanup, massaging repos generated by a VCS migration tool, etc. For example, years ago I participated in the move of a large CVS repo to git; there was significant filter-branch post-processing required to create an acceptable baseline)

justtopostthis3 · on Sept 20, 2019

  git filter-branch --env-filter " \
    export GIT_AUTHOR_NAME=Dade\ Murphy \
           GIT_AUTHOR_EMAIL=zer0cool@example.com \
           GIT_COMMITTER_NAME=Dade\ Murphy \
           GIT_COMMITTER_EMAIL=zer0cool@example.com"

owenmarshall · on Sept 20, 2019

git filter-branch will solve this for you:

https://stackoverflow.com/questions/750172/how-to-change-the...

i_v · on Sept 20, 2019

GitHub also supports a special <username>@users.noreply.github.com address if you wanted her to retain semi-anonymous authorship as a GitHub user.

brlewis · on Sept 20, 2019

The repo is on gitlab, but I can always make up a noreply address.

jacobevelyn · on Sept 20, 2019

A few years ago I made a similar project with a slightly different twist: https://github.com/JacobEvelyn/git-self-blame

I did it as a learning exercise, and if anyone's interested I documented the source in a lot of detail to show everything I learned along the way.[1]

[1] https://github.com/JacobEvelyn/git-self-blame/blob/master/gi...

mohsen0 · on Sept 20, 2019

And that is why signing commits should be enforced.

Znafon · on Sept 20, 2019

You would already get a conflict as the history of the repo changed and signing all commits as some drawbacks as Torvalds explained here: http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-t...

I'm not sure it's better.

inlined · on Sept 20, 2019

I think Torvald’s stance is reasonable when considering a customer’s safety as guaranteed by an organization. E.g. this build is signed as safe.

Commit signatures are useful in large organizations designed to worry about insider threats. If code that is reckless or malicious is found in a build, you want repudiation of the author. Lack of commit signatures allows a malicious actor to cover their tracks.

And also, we should accept that we don’t treat all authors with the same scrutiny. Veterans’ code gets scrutinized less, so let’s actually trust that they’re the real author before signing a tag with their code.

hinkley · on Sept 20, 2019

How would merges work there?

I've had a coworker, "Tom", who was terrible with three way merges (why is it the people awful at merges want to do the most merges by insisting on feature branches for their code?)

I'm still not sure what he was doing but some of his merges ended up with the wrong name next to code. We started figuring this out about him when "George" was getting dressed down for a bug he introduced.

Two things drew me into this. First, I was getting tired of things being blamed on George. Everybody in this group had issues, nobody should have been pointing fingers at anybody else, especially this guy or his partner in crime, Tom. But equally important to me at that moment was that I was the primary on that code review, so now it's on me too.

A lot of code I look at becomes a bit of a blur, but I remembered this block of code particularly well, because it was the sort of tricky code that George sometimes cocks up but bless him if he didn't get it right on the first try. Only the code we were upset about wasn't the code I reviewed. His name was on it. The commit sequence lined up. What the hell.

An excruciatingly long git bisect later (git bisect is not built for some things, this included) and I track it down to a bad three way merge by Tom. He ended up with some bastardized version of left and right that had its own set of bugs, and George's name on the commit. I hadn't known you could do that with Git. It was quite upsetting.

tomcatfish · on Sept 20, 2019

Do you have any more information of any kind on this (like info you have run into since then)? This sounds very interesting and it also sounds like something I should be aware is possible to do (especially on accident).

Znafon · on Sept 20, 2019

This makes sense,all organizations are different and it is true that all changes to the kernel tree are publicly ACK-ed before geetting committed.

Maybe we could make a note of the public key that pushed each commit to the repo so we get the best of both ways, each commit is associated to a user from it's public key, not just the Author field and tags are signed by GPG.

saagarjha · on Sept 20, 2019

Personally, I find this really useful when I accidentally squash something incorrectly during a rebase and in the process of cleaning it up end up with changes attributed to the “wrong” person.

spraak · on Sept 20, 2019

Check out `git reflog` to go back to before the mistake

SilasX · on Sept 20, 2019

Obligatory self-promotion of my opposite joke project, git-upstage, which steals credit for someone else's work. (Squashes their branch to a single commit under your name and backdates it five minutes.)

https://github.com/SilasX/git-upstage

(Inspired by the time someone typo'd "unstage" to "upstage" and I guessed what a git-upstage command would be.)

jay_kyburz · on Sept 20, 2019

This will come in handy when the Australian Government compels a programmer to put a backdoor in their companies software.

eljimmy · on Sept 20, 2019

I really dislike the term chosen for this feature. “Blame”, assumes the code is broken or written improperly in some way. Most of the time I use it I’m just trying to find out who wrote it so I can find the original commit to understand it in more context.

Should have named it “git who”

AceJohnny2 · on Sept 20, 2019

SVN has the alias "svn praise".

I was disappointed that git didn't have it, so I created myself one. I'm glad git has trivial support for aliases.

chemodax · on Sept 20, 2019

AFAIK original was "cvs annotate" [1]. Subversion introduced "svn blame" and "svn praise" aliases for "svn annotate" as some kind of joke. It's funny that git only has "git blame".

[1] https://compbio.soe.ucsc.edu/cvsdoc/cvs-manual/cvs_74.html

thwarted · on Sept 20, 2019

You can always use "git annotate" or alias git commands in your gitconfig.

panda88888 · on Sept 20, 2019

Or “git credit”

brantonb · on Sept 21, 2019

Xcode 10 changed their per-line annotation feature from Blame to Authors.

ElijahLynn · on Sept 20, 2019

Also, `git tell` the story.

some1else · on Sept 20, 2019

Mine is https://github.com/some1else

xivzgrev · on Sept 20, 2019

Loool

danilocesar · on Sept 20, 2019

isn't this just a wrap on top of git rebase -i HASH^; git commit --amend --author "Jhon Doe"?

Also, as already noted, this overwrites all the history after the commit, making it useless.

Then people said it's a joke...

I know I will get downvoted for this comment, but How did this make to the first page of HN?

pferde · on Sept 20, 2019

It's Friday, some levity is acceptable here and there.

danilocesar · on Sept 20, 2019

=)

Sounds fair.

b88d80170 · on Sept 20, 2019

agree. same question here.

SamuelAdams · on Sept 20, 2019

[deleted]