Just curious, why is this possible with an unmerged PR? Just a weird setup on Gi...

judge2020 · on Oct 25, 2020

I think it's because GitHub wants to allow repo maintainers to merge in PRs without them having to add separate remotes themselves, ie `git remote add` isn't required to `git merge`.

This basically means that any content can be injected into anyone's GH repo (since PRs can't be turned off), but really only in terms of being able to view it on the GitHub website. To give an example, pull 437 on torvalds/linux[0] hasn't been merged in, but if you go to the commit hash in the browser, suddenly main/init.c has the relevant changes and commit that condense the file into one line[1].

This very well could be abused - imagine framing (or just 'canceling') someone with [insert illegal content here] by PRing their repo with a commit with a forged author[2] then linking people to their repo with the commit tree showing the illegal content.

0: https://github.com/torvalds/linux/pull/437

1: https://github.com/torvalds/linux/blob/2793ae1df012c7c3f13ea...

2: https://stackoverflow.com/a/60900120/3878893

larose · on Oct 25, 2020

Indeed. And you don't even need to create a PR to "inject" commits into a GitHub repo. You just need to fork it and push to your fork. See https://mathieularose.com/github-commit-injection

mschuster91 · on Oct 25, 2020

yuck, yet another reason to get people to do commit signing - and enforce it by github not attributing unsigned commits.

steebchen · on Oct 25, 2020

Does commit signing really solve this? I believe you can restrict branches to only allow signed commits, but since these commits are not in any branch on that repository it looks like that wouldn't change anything. Correct me if I'm wrong, though.

mschuster91 · on Oct 25, 2020

That yes, but at least the github/gitlab/... UI could refuse to link unsigned comments to the userpage belonging to the email in the commit.

est31 · on Oct 25, 2020

It's due to how git works. In order for git tools to compare and otherwise work with two commits, both commits need to be in the same repo.

If "forking" a repo on github really cloned it in their infrastructure, they'd require far more data. So all forks of a github repo point to the same repo, only with different branches.

Note that git clone only clones the actually present branches of the upstream you point it to, but on the backend, all branches of all forks are present.

aiiie · on Oct 25, 2020

This isn’t simply because of how Git works. You can configure Git to look in multiple places for repo objects. For whatever reason, the GitHub devs either didn’t know this, or they didn’t want to implement their forking and pull request systems this way.

As someone else mentioned, this may be an intentional design to make it simpler to implement pulling down remote PRs from the destination repo.

est31 · on Oct 25, 2020

> You can configure Git to look in multiple places for repo objects.

What do you mean by multiple places for repo objects? Do you mean multiple remotes? The remotes are fully inside your local database if you run commands like git pull or git remote update, they are just not in your checkout. Commands like git show <commit hash> work on commit hashes in those remotes as well, even if it's not in one of your local branches.

Or do you mean configuring git to use multiple .git/objects directories? I haven't heard of that feature, can you give a link?

aiiie · on Oct 25, 2020

The feature’s called alternates. You can use it on-the-fly without modifying any repos by using the GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable.

If you want the effect permanently, there’s the .git/objects/info/alternates file. For HTTP remotes, there’s apparently a .git/objects/info/http-alternates file as well (no idea how that works though). I’m assuming these files allow multiple alternates as the environment variable does.

js2 · on Oct 25, 2020

I'm pretty sure GitHub does use alternates in the same way that that GitLab does:

https://docs.gitlab.com/ee/development/git_object_deduplicat...

I vaguely recall seeing @peff comment about this on HN years ago but I can't find that comment now. Here's a GitLab employee claiming GitHub uses alternates:

https://news.ycombinator.com/item?id=22179208

The thing is, that both the dmca repo and its forks must have alternates files to the same underlying common repo, otherwise the PR ref in the dmca repo wouldn't be able to see the merge commit pushed to the fork. Pushing the merge must have duplicated all the youtube-dl commits into the common repo used by both the dmca repo and its forks because youtube-dl and dmca would have different common repos.

est31 · on Oct 25, 2020

Oh indeed, very interesting. Apparently the feature also existed since 2005, before Github (2008) so they could have used it from the start.

https://stackoverflow.com/a/36125713

RulerOf · on Oct 25, 2020

There's a lot of reasons it's possible, but the one that sticks out is that the repo owner needs to be able to modify the commit before the PR is merged. AFAIK, the way that's done is by incorporating the remote repo's commit history into the destination repo underneath a pr-specific branch, which naturally brings all of the commits themselves into the repo's git database.

fireattack · on Oct 25, 2020

Follow up question, are these commits or pr-specific branch accessible in target repo's `git` (not GitHub)?

gizmo686 · on Oct 25, 2020

Yes.

You can get it with: git fetch <remote> refs/pull/<pr>/head

My git config has the alias:

  pr = !f() { git fetch $1 refs/pull/$2/head:pr/$1/$2; } ; f

which will create a local branch corresponding to the provide PR. This is useful for evaluating large PRs that would be difficult to fully evaluate with just the online UI.

RulerOf · on Oct 25, 2020

Yes, but I don't know if you have to track the branch first or not in order to pull down the data into your local repo.

But this is exactly how merging a PR locally[0] works.

0: https://docs.github.com/en/free-pro-team@latest/github/colla...

gizmo686 · on Oct 25, 2020

It is very useful that commits become part of the target repository as soon as a PR is created. This allows people reviewing the PR to checkout it on their local machines without needed to add the source repository as an additional remote.