Hacker News new | past | comments | ask | show | jobs | submit login
Git Blame-Someone-Else (github.com/jayphelps)
225 points by dcminter on Sept 20, 2019 | hide | past | favorite | 66 comments



What is more, you can:

1. clone https://github.com/torvalds/linux into https://github.com/<YOURNAME>/linux.

2. push a fake "torvalds" commit into your repo.

3. check the SHA of the the commit that you made.

4. the commit will be visible at the original repo URL with your SHA (https://github.com/torvalds/linux/commit/<SHA>), with no indication whatsoever that this is coming from a different repo

I have reported this problem to GitHub a while ago and they replied to me that this is a well known feature of the repo "network".


> the repo "network"

This is actually an optimization done by GitHub. It would take up a lot of space if GitHub copied the entire repo every time someone forked it, so they keep all the commits in the original repo. As a side effect, commits in forks are accessible from the original repo since commits from both repos are stored in the same place.


GP isn't saying GitHub should copy the entire repo, only that there should be some indication that the code you're looking at isn't the repo owner's (despite being committed in their name and on a repo they "control").

I don't see what optimization requires that. They already keep track of e.g. me pushing up someone else's commit after a rebase -- it indicates that I pushed but the commit originally came from someone else.


> I don't see what optimization requires that.

From a single commit ID you cannot tell which repo it came from. A "repo" is just a tree of commits.


Even if you're deduplicating commits / data internally, you can tell that that commit is not present in that repo as it is not an ancestor to any ref in that repository.

(You might argue that determining what refs contain a commit is potentially expensive, perhaps, but GitHub already does this, so I'd argue that it's not that expensive.)


Then it should be possible to store the originating repo along with the commit so that commits aren't visible in a given "repo" until they are pushed or pulled into that repo


Storing all commit origins at the scale of github is the expensive part.


Good thing Github is allowed to associate data with a commit ID, like they already do with rebased commits, as noted in the subsequent sentence.


This information is part of the commit created by git and not "associated data" added by GitHub.


Really? Where is git storing it? I don't see any information about my rebases in the message, or who pushed it.


github doesnt use plain bare repositories for their repo hosting, so they can do whatever they deem useful :)

if they did you'd be spot on though


How do they know which branch in my fork is mine vs upstream? Or in the case where I modify a forked branch?


It's sort of the reverse: they can't know, from a bare commit ID, what repo it "belongs" to without searching backward from every tag or branch in the repo. (Even that question is malformed: repos have histories and may have contained commits in the past that are no longer ancestors of existing branches or tags).

So they just fake it: they look in their database to find any commit with that SHA and put it up. And that database happens (for obvious performance reasons) to be shared between a repo and its forks.


A branch is just a series of commits; if any one of the commits has a different hash (as this hack will do) then the commit and all following commits will have a different hash.

Including the id of the branch (the HEAD).


It's simpler than that: a branch is just a pointer to one specific commit (with a specific SHA)


True, but it's both.

Just as a link in a linked list is often the list and the node in the list.


I'd imagine this is why GitHub disallows private forks?


Yes. If they did this, private commits might even leak into packfiles fetched from GitHub by git.


I've never really understood Torvalds' reason for not cryptographiclly signing commits.

> Btw, there's a final reason, and probably the really real one. Signing each commit is totally stupid. It just means that you automate it, and you make the signature worth less. It also doesn't add any real value, since the way the git DAG-chain of SHA1's work, you only ever need _one_ signature to make all the commits reachable from that one be effectively covered by that one. So signing each commit is simply missing the point.

http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-t...


Because each commit is in a cryptographically secure chain, when you sign a Git tag it vouches for the referenced commit and all the commits preceding it. This can be done at important moments such as each release.


Sure, but in the case presented by great-grand-parent is a leaf commit with an unknown providence.

At the very least I don't think it's "totally stupid", even if I know it's not a panacea for all ills.


I guess he's of the school that it doesn't matter who commits the code, it needs to be checked anyway, for bugs or being malicious.


Cryptographically signing the commits makes rebasing impossible (or at least more difficult).

In some cases the rebase is very clean, and none of the modified files had changed by other commits. I guess in this case, git can have a rule to keep a "link" to the old commit and accept the old signature as a signature of the new commit.

In some cases there are trivial changes, like indentation because someone else added an `if` around the code you are modifying. Sometimes part of the problem has been fixed. Sometimes one of the functions you use has an additional parameter. Sometimes the code has been moved to another file. In this cases it is difficult to automatically detect if the new rebased commit is equal enough to the old commit to accept the new signature.

We can go into the big rebase/merge debate. Linus is in the rebase camp.


That's because you are not supposed to rebase other people's code on top of a changed base. That can effectively modify the behaviour of their code change. So it's good that the resulting commit won't be signed anymore. And if you are rebasing your own code, then you can sign it again.


What about cherrypicking bug fixes to old versions?


Cherry picking creates a new commit.


The cherry picked comment usually has the same author and date that the original commit. (Note that rebasing also creates a new commit.)

One of the latest commits backported to Linux 4.9.something https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

Cherrypicked from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

Note that the changes are identical, just add ` &&ret` twice, but the line numbers have changed. Also, the cherrypicked version has an additional `Signed-off-by: `.


Unless I am missing something, his point seems to be different. He doesn’t seem to care about the non-repudiation of a user


Oh, I've had that at one of the place I used to work at. The git commit tree is signed, and once a team member left no one can create branches any more because all of his commits are now insecure.

Yeah that was fun.


Yeah, they would otherwise have to do a bit of work to keep separate maps of the objects in each fork.


It's funny but it doesn't actually work because the hash of the modified commit and all subsequent ones will necessarily change, right? So it would be very visible to everyone that a change has been made as it might break a lot of things to force-push an incompatible history of commit.


The following hashes do changes (you can see that on the gif on their readme). I think you're meant to force push after using the tool


> it doesn't actually work because the hash of the modified commit and all subsequent ones will necessarily change, right?

True, so it is just a joke.


Although your point is perfectly valid, I don't think the tool is intended for anything other than fun.


This might have a serious use. I have a private repo that I worked on with my daughter. If I open source it, ideally I'd keep the chronological history but scrub her email address out of it. It's OK that all the commit hashes would change. Would I want to adapt this joke tool to that purpose, or is there an exiting tool for rewriting history that way?


The standard way to do that is with 'git filter-branch'. If your daughter’s email is megatron@example.com,

    git filter-branch --env-filter '
    old_email=megatron@example.com
    new_email=redacted
    if [ "$GIT_COMMITTER_EMAIL" = "$old_email" ] ; then
        export GIT_COMMITTER_EMAIL="$new_email"
    fi
    if [ "$GIT_AUTHOR_EMAIL" = "$old_email" ] ; then
        export GIT_AUTHOR_EMAIL="$new_email"
    fi
    ' -- --all
This is “safe” in the sense that you can go back to the old version with the reflog if you screw things up.


Seconded. The git blame-someone-else tool is just using git rebase and git commit --amend internally to alter one specified commit.

git filter-branch is perfect for this kind of wholesale revision. filter-branch is essential for tasks like: open-sourcing repos that need some kind of cleanup, massaging repos generated by a VCS migration tool, etc. For example, years ago I participated in the move of a large CVS repo to git; there was significant filter-branch post-processing required to create an acceptable baseline)


  git filter-branch --env-filter " \
    export GIT_AUTHOR_NAME=Dade\ Murphy \
           GIT_AUTHOR_EMAIL=zer0cool@example.com \
           GIT_COMMITTER_NAME=Dade\ Murphy \
           GIT_COMMITTER_EMAIL=zer0cool@example.com"


git filter-branch will solve this for you:

https://stackoverflow.com/questions/750172/how-to-change-the...


GitHub also supports a special <username>@users.noreply.github.com address if you wanted her to retain semi-anonymous authorship as a GitHub user.


The repo is on gitlab, but I can always make up a noreply address.


A few years ago I made a similar project with a slightly different twist: https://github.com/JacobEvelyn/git-self-blame

I did it as a learning exercise, and if anyone's interested I documented the source in a lot of detail to show everything I learned along the way.[1]

[1] https://github.com/JacobEvelyn/git-self-blame/blob/master/gi...


And that is why signing commits should be enforced.


You would already get a conflict as the history of the repo changed and signing all commits as some drawbacks as Torvalds explained here: http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-t...

I'm not sure it's better.


I think Torvald’s stance is reasonable when considering a customer’s safety as guaranteed by an organization. E.g. this build is signed as safe.

Commit signatures are useful in large organizations designed to worry about insider threats. If code that is reckless or malicious is found in a build, you want repudiation of the author. Lack of commit signatures allows a malicious actor to cover their tracks.

And also, we should accept that we don’t treat all authors with the same scrutiny. Veterans’ code gets scrutinized less, so let’s actually trust that they’re the real author before signing a tag with their code.


How would merges work there?

I've had a coworker, "Tom", who was terrible with three way merges (why is it the people awful at merges want to do the most merges by insisting on feature branches for their code?)

I'm still not sure what he was doing but some of his merges ended up with the wrong name next to code. We started figuring this out about him when "George" was getting dressed down for a bug he introduced.

Two things drew me into this. First, I was getting tired of things being blamed on George. Everybody in this group had issues, nobody should have been pointing fingers at anybody else, especially this guy or his partner in crime, Tom. But equally important to me at that moment was that I was the primary on that code review, so now it's on me too.

A lot of code I look at becomes a bit of a blur, but I remembered this block of code particularly well, because it was the sort of tricky code that George sometimes cocks up but bless him if he didn't get it right on the first try. Only the code we were upset about wasn't the code I reviewed. His name was on it. The commit sequence lined up. What the hell.

An excruciatingly long git bisect later (git bisect is not built for some things, this included) and I track it down to a bad three way merge by Tom. He ended up with some bastardized version of left and right that had its own set of bugs, and George's name on the commit. I hadn't known you could do that with Git. It was quite upsetting.


Do you have any more information of any kind on this (like info you have run into since then)? This sounds very interesting and it also sounds like something I should be aware is possible to do (especially on accident).


This makes sense,all organizations are different and it is true that all changes to the kernel tree are publicly ACK-ed before geetting committed.

Maybe we could make a note of the public key that pushed each commit to the repo so we get the best of both ways, each commit is associated to a user from it's public key, not just the Author field and tags are signed by GPG.


Personally, I find this really useful when I accidentally squash something incorrectly during a rebase and in the process of cleaning it up end up with changes attributed to the “wrong” person.


Check out `git reflog` to go back to before the mistake


Obligatory self-promotion of my opposite joke project, git-upstage, which steals credit for someone else's work. (Squashes their branch to a single commit under your name and backdates it five minutes.)

https://github.com/SilasX/git-upstage

(Inspired by the time someone typo'd "unstage" to "upstage" and I guessed what a git-upstage command would be.)


This will come in handy when the Australian Government compels a programmer to put a backdoor in their companies software.


I really dislike the term chosen for this feature. “Blame”, assumes the code is broken or written improperly in some way. Most of the time I use it I’m just trying to find out who wrote it so I can find the original commit to understand it in more context.

Should have named it “git who”


SVN has the alias "svn praise".

I was disappointed that git didn't have it, so I created myself one. I'm glad git has trivial support for aliases.


AFAIK original was "cvs annotate" [1]. Subversion introduced "svn blame" and "svn praise" aliases for "svn annotate" as some kind of joke. It's funny that git only has "git blame".

[1] https://compbio.soe.ucsc.edu/cvsdoc/cvs-manual/cvs_74.html


You can always use "git annotate" or alias git commands in your gitconfig.


Or “git credit”


Xcode 10 changed their per-line annotation feature from Blame to Authors.


Also, `git tell` the story.



Loool


isn't this just a wrap on top of git rebase -i HASH^; git commit --amend --author "Jhon Doe"?

Also, as already noted, this overwrites all the history after the commit, making it useless.

Then people said it's a joke...

I know I will get downvoted for this comment, but How did this make to the first page of HN?


It's Friday, some levity is acceptable here and there.


=)

Sounds fair.


agree. same question here.


[deleted]




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: