When I left, the diff number was in the 15 million range. Not all diffs are landed, but I would assume >60% are, so FB's repo is almost certainly above 10M commits
Nothing hacked about it. They rewrote it completely, keeping just the interface for compatibility. Perforce scales very well but still has a single server at its core - at some point no matter how much money google threw at that machine (it used to be the beefiest single server they had), it just couldn't keep up.
http://hg.mozilla.org/try appears to have over 3M commits, and probably in excess of 100k heads (effectively git branches, although I don't think git has any proper term for a commit with no children that isn't referred to by a branch).
Strictly speaking, it's not actually the main project repository (which has closer to 600k commits), but the repository that contains what is effectively all of the pull requests for the past several years (more specifically, all the changes you want to test in automation).
The closed-source monorepos of Google (perforce IIRC), Facebook (Mercurial), and Microsoft (Git) are all going to be far larger than any open-source repository, of which Linux is in the largest size class but not the largest (I believe Chromium's the largest open-source repo I've found).
No, because piper doesn’t care how big your files are, and devs don’t ever need to pull or clone the repo locally.
When they were still using actual Perforce there was a team who would browbeat people who had more than a hundred clients. That is they only time I can remember running up against a limit of the SCM.
By "scalable" I assume you mean "centralized". As in the repository is hosted on a single machine, so you only have to worry about one machine meeting the hardware requirements for storing all that data. That scales better with repo size, but it scales worse with the size of your development team. I'm sure the Linux kernel has orders of magnitude more developers than a typical video game or engine.
Yes, at scale it has to be. Google has hundreds of terabytes of data in their monorepo, you can't check out it all! Historically centralized used to be the norm - previous popular VCS generations (CVS, subversion) are all like that. DVCS (git & co) came into dominance only in the last decade or so.
> That scales better with repo size, but it scales worse with the size of your development team.
Google switched to piper around 2014 when they had ~50k employees. Perforce monorepo worked pretty well for them until then. It has certain costs to scale a monorepo that far - lots of investment into tooling to make it all work, needs dedicated teams. But it can be scaled. And it offers certain benefits that are very difficult to harness in multi-repo setups - ability to reason about and refactor code across entire repo is the biggest one.
> I'm sure the Linux kernel has orders of magnitude more developers than a typical video game or engine
Linux kernel development is very different, decentralized across many different companies and ICs, hence the need for DVCS systems like git. In a corporate environment like google or gamedev, it is much easier to keep version control centralized and dedicate a team to maintaining it.
Epic Games' p4 depot has well over 1mm changelists. Many of those numbers are taken up by developer changes that never get submitted, and many are automated merges though
I thought I read that Linux jettisons old history every few years for the sake of practicality, and that if you want the full history you have to look at special archive repos. Am I wrong? I wouldn't blame them; git is fast, but it's not that fast, and cloning becomes a bear after only a few hundred thousand commits (and I would be surprised if that's the only operation that scales poorly).
Are there any examples of projects with 1kk+ commits that use SVN, Mercurial, Perforce, or some other SCM?