This really speaks to the reliability of Git. Are there any examples of projects...

Cyph0n · on May 2, 2021

Mercurial was used at Facebook afaik, and I would guess they ended up exceeding 1 million.

frob · on May 2, 2021

When I left, the diff number was in the 15 million range. Not all diffs are landed, but I would assume >60% are, so FB's repo is almost certainly above 10M commits

papito · on May 2, 2021

Sheeeit. At some point, just easier to archive the thing and start with a fresh import.

saagarjha · on May 3, 2021

But you want to have history.

eru · on May 2, 2021

And Google uses a hacked up Perforce.

quantumofalpha · on May 2, 2021

Nothing hacked about it. They rewrote it completely, keeping just the interface for compatibility. Perforce scales very well but still has a single server at its core - at some point no matter how much money google threw at that machine (it used to be the beefiest single server they had), it just couldn't keep up.

eru · on May 3, 2021

Yes. Sorry, I was a bit sloppy in my expression.

It started out as vanilla, then hacked up, and then the re-write.

I think Facebook's mercurial was/is also a special edition? Or is everything they do upstreamed?

Calzifer · on May 2, 2021

Apache had a single SVN repository for all projects in the past. That reached 1889412 commits.

https://svn.apache.org/viewvc

jcranmer · on May 2, 2021

http://hg.mozilla.org/try appears to have over 3M commits, and probably in excess of 100k heads (effectively git branches, although I don't think git has any proper term for a commit with no children that isn't referred to by a branch).

Strictly speaking, it's not actually the main project repository (which has closer to 600k commits), but the repository that contains what is effectively all of the pull requests for the past several years (more specifically, all the changes you want to test in automation).

The closed-source monorepos of Google (perforce IIRC), Facebook (Mercurial), and Microsoft (Git) are all going to be far larger than any open-source repository, of which Linux is in the largest size class but not the largest (I believe Chromium's the largest open-source repo I've found).

elteto · on May 2, 2021

> although I don't think git has any proper term for a commit with no children that isn't referred to by a branch

I think this would be one case of a “detached head”.

mfateev · on May 2, 2021

Google is mimicking perforce command line. The backend is 100% proprietary.

Microsoft is based on Git, but with a lot of engineering on top of it: https://devblogs.microsoft.com/bharry/scaling-git-and-some-b...

jeffbee · on May 2, 2021

Google announced they had 35 million commits to their monorepo, five years ago.

WanderPanda · on May 2, 2021

Do they have a quick response team to incarcerate newbies who commit binaries to their giant monster?

jeffbee · on May 2, 2021

No, because piper doesn’t care how big your files are, and devs don’t ever need to pull or clone the repo locally.

When they were still using actual Perforce there was a team who would browbeat people who had more than a hundred clients. That is they only time I can remember running up against a limit of the SCM.

quantumofalpha · on May 2, 2021

That's a non-issue for scalable version control systems they use - perforce and now its in-house replacement (piper).

Gamedev companies LOVE perforce because it scales, they keep game assets in it and those can be huge.

CivBase · on May 2, 2021

By "scalable" I assume you mean "centralized". As in the repository is hosted on a single machine, so you only have to worry about one machine meeting the hardware requirements for storing all that data. That scales better with repo size, but it scales worse with the size of your development team. I'm sure the Linux kernel has orders of magnitude more developers than a typical video game or engine.

quantumofalpha · on May 2, 2021

> By "scalable" I assume you mean "centralized".

Yes, at scale it has to be. Google has hundreds of terabytes of data in their monorepo, you can't check out it all! Historically centralized used to be the norm - previous popular VCS generations (CVS, subversion) are all like that. DVCS (git & co) came into dominance only in the last decade or so.

> That scales better with repo size, but it scales worse with the size of your development team.

Google switched to piper around 2014 when they had ~50k employees. Perforce monorepo worked pretty well for them until then. It has certain costs to scale a monorepo that far - lots of investment into tooling to make it all work, needs dedicated teams. But it can be scaled. And it offers certain benefits that are very difficult to harness in multi-repo setups - ability to reason about and refactor code across entire repo is the biggest one.

> I'm sure the Linux kernel has orders of magnitude more developers than a typical video game or engine

Linux kernel development is very different, decentralized across many different companies and ICs, hence the need for DVCS systems like git. In a corporate environment like google or gamedev, it is much easier to keep version control centralized and dedicate a team to maintaining it.

CivBase · on May 2, 2021

Right. I just wanted to point out that centralized and distributed version control are both designed to scale, just in different ways.

eru · on May 2, 2021

They have extensive code review.

neurocline · on May 2, 2021

Epic’s Unreal Perforce repo is >1.5 million at this point.

jedimastert · on May 2, 2021

>1kk

That's going in the ol' geek toolbox

the8472 · on May 2, 2021

just use the other SI prefixes. 1M.

https://en.wikipedia.org/wiki/Metric_prefix#List_of_SI_prefi...

jedimastert · on May 3, 2021

Oh, I know. That one is for sillies

maccard · on May 2, 2021

Epic Games' p4 depot has well over 1mm changelists. Many of those numbers are taken up by developer changes that never get submitted, and many are automated merges though

kingsuper20 · on May 2, 2021

Isn't OpenBSD at the 500k'ish mark using CVS?

kibwen · on May 2, 2021

I thought I read that Linux jettisons old history every few years for the sake of practicality, and that if you want the full history you have to look at special archive repos. Am I wrong? I wouldn't blame them; git is fast, but it's not that fast, and cloning becomes a bear after only a few hundred thousand commits (and I would be surprised if that's the only operation that scales poorly).

CorrectHorseBat · on May 2, 2021

No they don't do that. Only happened once when moving to git: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

byroot · on May 2, 2021

That link is not responding for me, but here's a mirror: https://github.com/thorvalds/linux/commit/1da177e4c3f41524e8...