I think it's because GitHub wants to allow repo maintainers to merge in PRs without them having to add separate remotes themselves, ie `git remote add` isn't required to `git merge`.
This basically means that any content can be injected into anyone's GH repo (since PRs can't be turned off), but really only in terms of being able to view it on the GitHub website. To give an example, pull 437 on torvalds/linux[0] hasn't been merged in, but if you go to the commit hash in the browser, suddenly main/init.c has the relevant changes and commit that condense the file into one line[1].
This very well could be abused - imagine framing (or just 'canceling') someone with [insert illegal content here] by PRing their repo with a commit with a forged author[2] then linking people to their repo with the commit tree showing the illegal content.
Does commit signing really solve this? I believe you can restrict branches to only allow signed commits, but since these commits are not in any branch on that repository it looks like that wouldn't change anything. Correct me if I'm wrong, though.
It's due to how git works. In order for git tools to compare and otherwise work with two commits, both commits need to be in the same repo.
If "forking" a repo on github really cloned it in their infrastructure, they'd require far more data. So all forks of a github repo point to the same repo, only with different branches.
Note that git clone only clones the actually present branches of the upstream you point it to, but on the backend, all branches of all forks are present.
This isn’t simply because of how Git works. You can configure Git to look in multiple places for repo objects. For whatever reason, the GitHub devs either didn’t know this, or they didn’t want to implement their forking and pull request systems this way.
As someone else mentioned, this may be an intentional design to make it simpler to implement pulling down remote PRs from the destination repo.
> You can configure Git to look in multiple places for repo objects.
What do you mean by multiple places for repo objects? Do you mean multiple remotes? The remotes are fully inside your local database if you run commands like git pull or git remote update, they are just not in your checkout. Commands like git show <commit hash> work on commit hashes in those remotes as well, even if it's not in one of your local branches.
Or do you mean configuring git to use multiple .git/objects directories? I haven't heard of that feature, can you give a link?
The feature’s called alternates. You can use it on-the-fly without modifying any repos by using the GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable.
If you want the effect permanently, there’s the .git/objects/info/alternates file. For HTTP remotes, there’s apparently a .git/objects/info/http-alternates file as well (no idea how that works though). I’m assuming these files allow multiple alternates as the environment variable does.
I vaguely recall seeing @peff comment about this on HN years ago but I can't find that comment now. Here's a GitLab employee claiming GitHub uses alternates:
The thing is, that both the dmca repo and its forks must have alternates files to the same underlying common repo, otherwise the PR ref in the dmca repo wouldn't be able to see the merge commit pushed to the fork. Pushing the merge must have duplicated all the youtube-dl commits into the common repo used by both the dmca repo and its forks because youtube-dl and dmca would have different common repos.
There's a lot of reasons it's possible, but the one that sticks out is that the repo owner needs to be able to modify the commit before the PR is merged. AFAIK, the way that's done is by incorporating the remote repo's commit history into the destination repo underneath a pr-specific branch, which naturally brings all of the commits themselves into the repo's git database.
which will create a local branch corresponding to the provide PR. This is useful for evaluating large PRs that would be difficult to fully evaluate with just the online UI.
It is very useful that commits become part of the target repository as soon as a PR is created. This allows people reviewing the PR to checkout it on their local machines without needed to add the source repository as an additional remote.