Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We Halved Go Monorepo CI Build Time (uber.com)
161 points by kiyanwang on June 26, 2022 | hide | past | favorite | 234 comments


Putting aside questions of monorepos, I am always astounded by how much big companies invest in their build/change management/review systems. Shipping becomes a whole product in itself. This appears to be a serious piece of engineering, but it’s directed at Uber’s tooling, not Uber’s business problems or even Uber’s product.

I’m sure it’s fascinating for people who are into that, and I’m not saying there isn’t a substantive problem to solve here, but I’d personally never go anywhere near a problem like this. It’s too many steps removed from the value. I’d keep thinking “we are down a very deep rabbit hole. A side quest off a side quest. Should we be doing this?”

But I’ve always worked at smaller places, where things like toolchains and sales ops and marketing tech don’t take on huge lives of their own.


I've repeatedly seen both high quality and low quality tooling act as massive force multipliers even for small teams; the question is just which direction you want to multiply. They say an army marches on its stomach—well, a tech effort marches on its tools. I interned at a trading company with, at the time, maybe 100 developers that built almost everything in-house and yet was quite a bit more successful and productive than much larger teams I've worked on since. The larger teams had incentives and structure to "focus" on what's "important" for very narrow notions of important... and so they ended up with lower-quality internal systems, a bunch of day-to-day drag on development and, both technically and organizationally, far less adaptability. (Turns out, one of the major advantages of investing in your own tools is that it builds up deep institutional knowledge about tooling that is hard-to-impossible to maintain otherwise.)

Only valuing easily-measurable work is, frankly, a modern organizational disease. It's like searching for the keys under the streetlight—and why? As a substitute for human judgement or resolving disagreements? As a way to make work more legible for executives? It leads to unforced errors and systemic problems, but we can't seem to do anything about it.


Recently I showed a development team how to use a debug tool that can automatically collect debug snapshots from production that can be opened with one click in the IDE. It’ll jump to the line of code and show the state of everything — not just the stack but the heap too.

My demo is to pick a random crash and diagnose the root cause while talking. The mean time to resolution is low single digit minutes.

I showed this to an entire team, one person at a time, solving bugs as I went. I showed the junior devs, senior devs, and their manager.

No interest. None. Just… silence.

The tools are amazing, but the lack of motivation from the typical developers for learning to use them is even more amazing.


Pointer to the tool? It sounds awesome.


Azure App Insights can do Debug Snapshot collection, but it’s not the only such tool. It just makes the workflow quick and simple.

Ref: https://docs.microsoft.com/en-us/azure/azure-monitor/snapsho...

It works even better with DevOps source indexing added to the build pipeline:

https://docs.microsoft.com/en-us/azure/devops/pipelines/task...

A vaguely similar feature is Azure App Service memory leak diagnostic tools which take memory dumps on certain triggers or at intervals.

You can open these in Visual Studio and it’ll show the heap deltas over time.


Seconded!


Playing devils advocate here. It could just be that they already have the stack trace in logs somewhere, so how does this help you any faster? Bulk of the time to resolution will be figuring out what to do anyways.


These were all bugs that were nearly impossible to solve with just a stack trace. For example, there was a "format" exception on a web page with hundreds of uses of string formatting (to generate a report). Another example was a function call complaining about a null argument. Which instance? The page had 40+ calls to the same function, each with 6 arguments.

Most of these couldn't be reproduced either. As in, you'd get a crash once a day in a page that would otherwise work successfully thousands of times.

How would you fix a problem where there's a stack trace only from a release? The scenario is: you can't reproduce the errors, you don't get line numbers, you don't even get function argument values.

I could solve these in minutes using this tool. Could you match that without such tooling?


I’m guessing the organization incentives are against investigating/fixing random crashes. It might be unscheduled, or seen as “test” or “qa” or “ops” work. Try working with management to set up a way to reward the behavior you want to encourage.


> Only valuing easily-measurable work is ... a modern organizational disease


It is really just a matter of scale. At big companies it is easier to see the value of having people on these problems, so the teams have a clear reason to exist.

As for your "side quest off a side quest" observation, I think that is a matter of perspective, about what personally motivates you, and about how work is organized in society. I find that working at a small company makes it easier to have a feeling of focus and purpose -- everybody is focused on one thing. But, I find that working at a large (profitable, well run) company makes it easier to feel like you can actually move the needle and have the resources do large things (and yes, this includes a connection between something like a 'build tools' team and being able to ship a quality product more quickly).

It is easy to miss that every race car needs a pit crew, every pit crew needs equipment and parts, getting parts requires suppliers, parts need to be built, tested, etc., all those people need to eat, all that food needs to be grown, and so on. Organizing all of that "work" into small single-purpose companies, that buy each others' stuff, is one way to go. Another way to go is to organize it into larger companies with sub-teams/groups/departments. So a company's larger mission might be "win races" but, if it is large enough, smaller support efforts in support of the larger goal can make a lot of sense.


Tech companies pay for few things more than for developer time. In a small company, spending one day to speed up a 30 minute build of a codebase used by 10 developers by 1 minute is not going to unlock much developer time, but imagine you multiply it by a thousand or ten thousand developers. Suddenly you save 10 thousand minutes per build, or 166 hours. Imagine the average dev runs a build three times per day, your 1 minute saving is now saving 498 hours of developer time per day, or 62 8 hour work work days per day (and there is the saying that in FANG it's way less than 8 hour work days). These are continuous cost savings, so the value your 1 day investment created is equivalent to 62 fully paid developers. Isn't that amazing?

I think tech companies should spend a lot more on speeding up their stack. I would love to do something like this (improving developer workflows in a giant organization) as my day job.


The counter argument here is that that 1 minute difference doesn’t correspond to a minute of extra work in the way that ‘62 days of work saved’ conveys. But the counter-counter argument is that sufficiently fast/good tooling is qualitatively different and increases productivity by enabling new workflows (eg build it vs think carefully about whether it will compile right because builds are slow; break up changes into smaller chunks vs write big hard-to-review changes because CI is slow) such that speeding up tools or maintaining performance as the code base grows are valuable.


I remember reading once that the larger your engineering org, the larger the percentage of your engineers should work on tooling. As the complexity of your system grows, the ability to know if your change causes an ill effect becomes more difficult. Large systems have large and complex tooling because it becomes a necessity for survival.


Do you remember where you read that? I'd love to read that article/book.


Investing in that is vital if you want to grow as an engineering company. I work at a company with 2000 engineers, and the infrastructure and our build/change management/review system is horrible. It takes me all day to check something in, the integration tests are flaky and waste a lot of time tracking down errors, etc. Investing in this would collectively save so much time, but the people in charge think we can't afford the investment so everyone suffers.


Sounds more like an organizational issue than a technical one.


I’d say that this is just internal platform / infrastructure support, aimed at making the whole organisation be more effective. It’s absolutely adding a lot of value to the organisation, it’s just not visible from the outside.

But saying that “it’s too many steps removed from the value” appears to be a misrepresentation of the value it adds.


The amount of people dedicated to such projects is often a very small overall percentage of the engineering force, and they act as a leverage point. Invest in X staff, make all eng at the company 1.X more effective, therefore giving you a large return on investment.

> It’s too many steps removed from the value.

Don't be worried about being too many steps removed from the value. With that logic, CEOs, CTOs, etc are also too many steps removed from the value, yet they are valued very highly. Look at such infra work as multiplying the value of the entire company and in a way, being part of a "CEO team".


> It’s too many steps removed from the value.

And this kind of reasoning is why tooling in most shops absolutely sucks


YAGNI is always a valid strategy but what you are and are not going to need will change with scale. An inconsequential case can turn out to be something that impacts several teams.

At scale, rabbit holes can be quite relevant. At scale, you can no longer trust that your personal view of the org is complete enough.

There's still plenty of waste but just because you don't need something doesn't mean it isn't in another team's hot loop. Best to come in with trust first and incredulity second.


Do you also find the existence of accounting or janitorial departments in companies whose core business is neither accounting nor custodial services strange?


And yet accountants mostly adhere to GAAP no matter the industry, and the janitors are using the same tools their counterparts are.


I would find it pretty strange if the janitorial department were smelting iron ore to make their own trash bins.


Per DORA and the information in Accelerate, shipping code frequently and reliably contributes directly to good outcomes. It seems like a good investment.


I don't think the commenter is challenging this idea. Many companies do this with "off the shelf" CI/CD solutions without committing large amounts of money into bespoke tooling.


From my experience, the off the shelf tooling works poorly with monorepos. Everything is still optimised for the polyrepo microservice craze of 2017.

There are tools out there, but they're not free and they involve some level of vendor lock in. I've worked on some monorepo build scripting myself.


In my opinion, git in general works poorly as a monorepo.

My ideal VC solution would have the branch control of git with the checkout model and hook configuration of SVN.

I am not familiar enough with the version control ecosystem to know if there’s another product that does it, but I doubt it since I kind of suspect it would violate the CAP theorem.


Personally, Uber doesn't come off to me as an engineering culture that values simplicity.

And in fact, in this very article, the last paragraph actually states that they are aiming to increase the complexity of the change validation process?!?

> Additionally, it helped our team focus on increasing the complexity and features of our change validation process.

Personally, I haven't been super impressed with any of the engineering projects I've heard come out of Uber, but what they're doing obviously works to some degree so I can't be one to judge.


Deck.gl is a fine map visualisation library.


> It’s too many steps removed from the value

Uber is, partly, a software business. They need to ship software. They get direct value from shipping good software efficiently.


People drive cars to work, school, etc.

Look at the infrastructure required for that and what we spend on it. It’s massive.


Best practices in software tend to not be confined to a specific domain, so the amount of money spent in the industry being targeted really isn't relevant to need for custom CI vis-a-vis off-the-shelf solutions.


Wait til the regulators come for your industry. Modern IT has to prove to Auditors amongst other things that no one rogue person can contribute malicious code. I've worked in companies that let people do their own thing but then let them deal with Audit. A much better company had a sophisticated custom SDLC process that meant Audit just checked you used the process.


Most of what this article seems to be saying is that Uber wasn't using all the features of bazel (and their setup also predates some of them), and as they've started to use them things are getting a lot faster. Prior to doing so they were hacking around it like controlling when folks could land large changes.

The rest of the article is about their transition from their legacy infra like Jenkins to more modern and flexible things like buildkite (which is of course relevant for anyone in a similar transition).

If your company/project starts with bazel + remote cache + RBE you're pretty close to the end state they're trying to get to.

See also for OSS and paid solutions: https://bazel.build/community/remote-execution-services


It seems to me that an easier way to avoid this problem is to not have monorepos in the first place. Uber's architecture is, AFAIK, heavily based on microservices, so I wonder what's the advantage they get in using a monorepo.


The company I work for has recently transitioned from many small repos to a single monorepo. I can tell you that the development experience is MUCH better. 99% of issues required cross-repo coordination, which was a nightmare, and thing could easily get out of sync.

With a monorepo, you don‘t need to think about which commits work with which other commits. One commit ID is a full description of every subcomponent.


I don't really understand how that creates fewer bottlenecks. It may save communication overhead because every dev can spot integration issues on their own, but at the cost of monstrously complex and time-consuming build-and-test pipelines. The microservice approach just requires some kind of service contract and meaningful release versions. That's always been a very manageable cost in my experience.


> just requires some kind of service contract and meaningful release versions.

I've just switched to a company with many tiny repos and IMO it's a huge hassle. There's no automated integration testing, and manual integration testing is a huge pain to set up.

I don't even know how I'd create integration tests that run as part of the CI. What version would I use? If you need to change both sides (very common), coordinating the commits and releases is very painful.

The "monstrously complex" build-and-test pipelines are a significant cost, but the alternative is higher release failure rate and moving slower overall IMO.


In my experience the Uber mono repo was a giant velocity killer. Go tooling just completely choked, and at the time they had tons of terrible hacks around go modules by having this weird system that required your mono repo to be in GOPATH, further breaking even more tooling.

No clue if that’s fixed today but it soured the idea of monorepos for me, and that’s not incorporating how often submitqueue would go down.


The sharing of service contracts and their shared dependencies is a difficult problem to solve by itself. There's companies trying to make money off of the problem (buf). Monorepos solve the problem out of the box.


If 99% of issues required cross-repo coordination, then those repos by definition should not have been distinct from each other. Splitting code into separate repos only makes sense if changes can be shipped in isolation.

1 repo == 1 bounded context == 1 isolated unit of deployment.


No true scotsman would split up repos wrong.


Then any time you have fan-shaped dependencies, you need a monorepo.

Which, well, the Uber people clearly have since these big changes exist.


> 1 repo == 1 bounded context == 1 isolated unit of deployment.

The most obvious counterpoint: many systems of record need to scale command operations (writes) and query operations (reads) separately.

So you absolutely would have separate programs, ASGs etc for those two roles


How is this a counterpoint?


They wouldn’t have a single deployment but exist in one bounded context.


Oh, indeed, my mistake. A team can of course deploy its bounded context at whatever granularity it prefers. The important thing is just that that deployment schedule is unrelated to the deployment schedule of other teams/contexts.


Monorepos have the big advantage that cross-project changes can be made together in one PR.


Then why not call it just a repo and build a monolith?

I've seen people going crazy with this bullshit of monorepos to the point every single directory was a "package" when it could be just a plain module import.

If you want to do microservices, then putting everything back into a single repository and enforcing everyone to use the same version and every change to require upgrades and deploys across the board is totally backwards.

Either do a monolith and have that consistency, or do microservices and allow teams to follow their own rules as long as they keep APIs stable.

Nonorepos + microservices is just a demonstration of everything that's wrong in the technical aspects of our industry. Just applying absolutely everything you read about without even considering if it might be better or not for your specific use case.


The way you store your source code doesn't have much to do with how your system is deployed. It's the same way you can have a closet in which you store clothing for many purposes; clothing for cold weather, clothing for hot weather, clothing for formal occasions, clothing for specific recreational activities, et cetera. You can store code for different purposes in the same source tree. You can deploy something that is built from just one part of the source tree.


The way you store your source code is 100% tied to how your system is deployed. Repos are the thing that define atomic changes at the source level. If a change at your business level requires atomic changes at the source level between X Y and Z, and X Y and Z aren't in the same repository, then you done played yourself.


> You can store code for different purposes in the same source tree. You can deploy something that is built from just one part of the source tree.

At the risk of sounding memetic, the question is not so much "Can you?" but "Should you?". Should you store a monolithic application across multiple repositories? Should you store a distributed application in a monolithic repository?

I agree with @likortera that you should not. And that's because...

> The way you store your source code doesn't have much to do with how your system is deployed.

I disagree with this statement.

Your architecture (monolithic vs distributed) imposes certain assumptions on other aspects of your distribution pipeline. Your workflow (in this discussion "how you store your source code") should support these assumptions, not hinder them.

For example, one benefit of microservices is that cross-functional teams can develop independently of each other. And yet one cited advantage of monorepos is that everyone is on the same version of dependencies all the time. In short, your teams are not independent after all.

Note that I'm keeping the example extremely generic to illustrate this inconsistency, a conflict of interest if you will, that I see people commit in this topic, because in my experience, these questions are not purely technical but involves product/business factors as well. Maybe for most of the people (operative emphasis on "MAYBE", because who am I to judge you), the discussion they need to have first is whether or not they are using the right architecture for their product in the first place.

If you choose to have a microservices architecture, you have to live with the fact that your teams/services will operate at different cadences. If you feel the need to impose a One True Library Version All the Time, then go for a monolithic architecture, and store your code in the same way.


Two things.

One, teams that share a dependency version can still develop independently; they just share something in common. They already likely share other things in common: deployment target OS, cloud platform and services, shared authN/authZ frameworks, etc.

Two, a monorepo is just the SCM mechanism. As the parent was describing, it doesn't prescribe anything other than the code storage location and how branching, committing, etc. works. Yes, a lot of organizations prefer having a single version rule in their individual monorepos, but nothing about monorepos in general makes this a requirement. You can use multiple versions of the same dependency and still gain advantages from the single commit benefits, and even famous instances like Google's have exceptions where this is the case.


> a monorepo is just the SCM mechanism ... it doesn't prescribe anything other than the code storage location and how branching, committing, etc. works

That's not really the case in practice. When people decide to choose mono vs multi based on their benefits, it's become a workflow philosophy in itself. If you choose a monorepo approach but use multiple versions of the same (in-house) dependency across components, you are just opening yourself up for a world of confusion. Sure, you can do it, but should you? Why choose a monorepo structure if you won't take advantage of its benefits?

> teams that share a dependency version can still develop independently; they just share something in common

My point about team independence doesn't mean they should not share anything at all. But rather, they now _update_ together at the same pace because the "atomic commit" that updated a dependency also updated my team's usage of said dependency, for better and for worse. My team might have a reason not to update just yet.


Spot on!

One of the companies I worked for, had the brilliant idea of putting EVERYTHING related to UI/frontend in a monorepo, where almost every single file or two were a different package. I used to joke there were more package.json files than actual js files (it was almost true).

I spent months saying this thing was a terrible idea. Nonetheless the "frontend infrastructure" folks wanted to do some CV padding and play with Lerna and their SV friend's cloud CI service startup, so they went ahead with it.

Months later the big problems started, among which one of the main ones was that they were pushing through every team's throats updates, breaking the product/features those teams were working on, disrupting their roadmaps, accusing each other's of low test coverage, doing hacks and workarounds, shit tons of crazy CI scripts for all the corner cases, much longer deploy times, most dev environments were a lot slower, deployment issues because now we had to deploy several different projects at the same time, etc, and of course not being able to upgrade to latest React because some team in the corner had an issue with it and they didn't have the time at the moment to fix it.

How did they solve all of this? In the span of 3 to 4 months they left the company. All four of them. Leaving behind an incredible amount of technical debt and nearly every frontend team totally fucked up.

What irks me is that some of these guys are pretty popular "youtubers" and spend their days giving talks of how great their work with monorepos and "frontend infra" is. They don't tell the messes they've caused of course.

Monorepos might be great if you're Google and have the resources and talent to do it right. For most companies out there, it is just creating a centralized problem that will eventually block everyone.

I'm a big proponent of monoliths, specially while you're not a > 300 person company. But if you're splitting your teams and services, then agree on APIs, don't break them, and let each team follow their own pace, with their own tools, and with their own schedules and preferences. Otherwise stick to the good ol' monolith and just separate things into modules/imports/whatever.

At my current employer, the main application has a "plugins" system with a very flexible and stable API. Every team around is just building "plugins" that can be installed into the main monolith, depending on each customer needs. This works fantastically well for a company with more than 1k engineers. No monorepos, no coupling, no interdependencies between teams, no parallel deploys, and each team manages their own destiny more or less.


Microservices, monorepos, monoliths etc are all outcomes of applying different tradeoffs to the 4+1 Views of Architecture [1].

The development view need not be connected to the deployment view at all, so unnecessarily coupling them can lead to worse outcomes. On the other hand, having the ability to couple them initially _and decouple them again in future_ works wonders for scaling.

I am consistently amazed that people with opinions on software architecture do not seem to recognise this seminal paper on the topic, or have the ability to re-synthesise it into tactics.

[1]: https://www.cs.ubc.ca/~gregor/teaching/papers/4+1view-archit...


Why should the system for managing source code have anything to do with the architecture of the deployed code? This coupling seems arbitrary to me.


This is almost the only advantage. A related "advantage" is forced upgrades when dependencies change - since you can't have code depending on different versions of in-repo libraries and services, client code needs to adapt or die.

The more clients a library or service has, the more expensive this is, and it's an ongoing maintenance cost for every client. Since people changing the service don't feel the full pain of this maintenance cost, changes keep on happening, and eventually clients get culled because it's too expensive to keep on maintaining them all.

From the outside, this looks like the company abandoning venerable but still working product, and makes people scratch their heads, wondering why.


You can have a monorepo without having all the modules share the dependencies. For example you can have multiple go.mod files in a single monorepo.

People often conflate these two things because often teams actually want to have centralized dependencies (so that you're forced to update or die as you said). If that doesn't work for you you can choose to have modules (or groups of modules) keep their independent set of dependencies, all while keeping the code in the monorepo.


This is the correct answer. Putting all integration changes in a single PR improves CI test coverage and is easier to verify in a code review.


This really begs the question why Github aren't working on cross-repo PR support - or anyone. Everytime mono-repos come up, cross-repo PRs are the reason given as to why they're needed - why can't we get this supported well?


That’s not just a GitHub issue, that would require serious UX work on the authoring side to make such “commits” that are actually commits in multiple repos at the same time. Effectively this is recreating a monorepo (maybe with something like for submodules?)


I don't think it's recreating a monorepo. What you're looking for is something like a DB transaction where all the merges in different repos succeed or everything gets rolled back.


Maybe I am missing something: to do that you need some kind of “meta-repository” linking commits between all the different sub repos in some order, no? If that happens for a significant portion of the commits in a repo, isn’t that basically equivalent to a mono repo with git submodules or something?


Well a meta repo isn't a mono repo. In that situation you can still update individual repos. Analogous to running a DB operatioj without a transaction.

But that said, no I don't think you need a meta repo. You need something. Call it a Change set. MVP would be say merge commits in 4 repos. If any fail then revert. If they all succeed then deploy in a specified order.


Sourcegraph does a decent job of this with their “batch changes” feature


Large cross project changes mean you have a monolithic project. One project split over multiple repos, not multiple projects. If you're separate projects were really separate they'd maintain API compatibility and upgrade paths (eg. semantic versioning). SaaS and microservice architectures require this to maintain separation but most organizations lack the discipline and slowly revert back to a monolith as they are faster to develop (until they aren't).


Pros

* Breaking changes can be done at once. Very helpful for runtime deps.

* No chance that a repo is out of date.

* Upgrades are atomic (may be hard to test a system in a half state).


While you’re deploying, production is going to be in a half-upgraded state (or maybe half rolled-back!), so it’s pretty important to be able to test that.


This depends on the deployment model. I am curious, how does your release and deployment looks like?


Release is a privileged tool pushing to Git. Deployment is incremental across a small but growing number of containers running a microservice, so both old and new code might be running concurrently for up to a couple of hours. Percentage experiments are pretty common, which means both old and new paths actually have to work in the same commit.


A well factored system should not have cross-project changes. Individual projects should be able to upgrade dependencies separately.


You can use this argument against anything you don't like.

'A well-designed system should not have X therefore you don't need Y.'

When 'A well-designed system should not have X' is just a matter of opinion, or doesn't acknowledge trade-offs that might make X the better option than Z, then this isn't a useful argument against Y.


yeaaah, I used to hold this view and drove a team - hard - to essentially pull apart a monorepo into different components with clear contract boundaries. It was by far one of my greatest errors in professional judgement.

at the risk of coming up with a contrived example: let's say you own a service that need to deserialize a datetime in a request in a format you don't currently support. assuming you own the stack, you need to a) update your date library b) update your webserver stack c) possibly update an intermediate webserver stack that includes primitives like logging, telemetry, tracing, auth, service discovery and d) your actual service.

If a->d are all independent, separate components, you have to orchestrate those changes through 4 separate repositories. And god forbid something you did at the lowest point in the stack is completely unworkable higher up.

There's all sorts of rocket science you could do to orchestrate these changes, but it ends up being contrived and edgecasey.

Most of the pain from monorepos can also be addressed with a dash of rocketscience (see:bazel), but the end model tends to have

a) have an easier mental model for the user

b) allow for consolidation of infrastructure work. Ie, your build/ci/language tooling teams can focus their efforts on one place

c) can coordinate changes across the entire stack within one field of view

d) can coral some of the worst, disparate instincts of a growing engineering org (ie, tons of teams optimizing for local maximas without internalizing knock-on effects).

e) fewer weird edgecases.


Its important to avoid deep dependency chains when doing this, yes. I usually recommend a "framework" multi-package repo and then multiple "application" / "service" repos.

I like having examples, even contrived ones, but I'm not sure I understood this one. Can you elaborate on what you mean? Is it about adding support for a new serialization format for dates in requests to a service? Why would this affect the webserver stack and logging/telemetry/tracing/auth primitives?

I find that a lot of organizations have really strange thoughts on how to factor things into separate microservices and libraries. Usually I approach by asking the following question: if this was an open source-library or service (e.g. like elasticsearch), would you use it? If not, then its probably not a great candidate for a separate thing - lets try and come up with something else.

One way to handle CI/CD is using standardized pipelines e.g. you tag your repo with a tag `app:node` or `lib:js` and the github org pipeline scanner will find it and assign the standard `app:node` or `lib:js` pipeline to it.

A way that I like better but most tools unfortunatley don't support it yet is for the infra teams to publish libraries that are essentially functions taking some parameters and generating (standard) pipelines/configuration. Those can then be tracked together the same as other dependencies.


Splitting your code over many repositories massively increases the cost of refactoring, making it that less likely that your system will be "well factored".

Besides the "N pull requests" problem you now lose history whenever you move a file across repo boundaries, and you'll eventually have straggler projects staying on old versions for years - so switching to a new way of doing something essentially means supporting both versions forever. Code for common needs gets duplicated, or worse, split out into yet another repository and /then/ duplicated, because no one wants to check if any of 100 repositories rely on the buggy behaviour they want to fix.

I find this to be this a level headed explanation of the advantages of monorepos: https://danluu.com/monorepo/. I'm surprised to see so many comments summarily dismissing them, as if sanely managing thousands of smaller interdependent repositories doesn't require at least as much investment in custom tooling.


> Splitting your code over many repositories massively increases the cost of refactoring, making it that less likely that your system will be "well factored"

The project should stay a full monolith until the factoring is more clear.

> I find this to be this a level headed explanation of the advantages of monorepos: https://danluu.com/monorepo/.

There are two issues I have with this article. One of them is when it describes drawbacks of multiple repositories, its not specific enough e.g.

> That sounds like it ought to be straightforward, but in practice, most solutions are cumbersome and involve a lot of overhead.

The other is that it assumes you have Google scale of resources to throw at the problem. If thats the case you can make anything work / monorepos or polyrepos. The issue is that small-to-medium sized organizations will not be prepared to invest the amount of resources needed to keep a monorepo working well, as the org will largly need to rely on existing available (OSS) tools which often have poor monorepo support. (Bazelifying everything has a significant cost, bazel rules are often not generic enough to work with e.g. the variety of JS ecosystem tools)


Only if you break your code up on boundaries that don't reflect reality.

If one business change requires code changes to ~every module in your source tree, then of course microservices, and therefore separate repos per service or whatever, make no sense at all.


One of the examples given was a go compiler upgrade for all of their go code. It seems reasonable that some companies would want to have a single team investigate and perform this upgrade all at once.

Sure, each project team could upgrade independently, but the company that’s chosen a mono-repo for their go code is likely to desire to have a single team tackle this upgrade.


In theory.

In practice, you often end up with larger changes which are mostly janitorial. Bumping a version of a log library across the infrastructure, or a version of a serialization system. This simplifies dependency convergence and a best handled as if they are cross-project changes.

The other problem is that over time, good factoring tend to deteriorate. A large software project will invariably have people with different brains working on it, and they'll have different needs for the factoring. So the project naturally pushes itself toward situations where cross-project changes become a necessary thing.


We already have tools that handle this for us for OSS libraries, such as dependabot etc.

Good factoring can probably be quantified too: dependency chains should be shallow and graph connectivity should be low. The more dependants a module has, the more stable its API contract should be.


Well, almost everyone talking here about nonorepos why they are actually building are distributed monoliths. That's why they need changes cross all clients on the same PR.


can you provide more info here?

i’m finding it hard to think of a system at a large company that would have no internal libraries used by multiple projects, that wouldn’t require ever making a cross project change.


Changes will be handled just like external library updates.


i suspect your internal model of a monorepo is fairly disjoint with an actual implementation. you’re arguing for non-monorepo based version concepts, without giving space to why having everyone locked to “latest” has large cost savings long term for maintenance and security. without those constraints, a monorepo is a bad choice.


I've worked with both models, and the monorepo model has been far more demanding in terms of time and resources to maintain as you push past the limits of the available tools one by one. Standard build tools go out of the window almost immediately (lots of work to wrap everything with Bazel), then standard collaboration tools (GH) and so on. A dedicated tooling team becomes a must.

What helps maintain polyrepos sane

1. A healthy methodology for dependency management

Evolve APIs with deprecations. Decide on a healthy amount of time for a deprecations to live. Set up alerts when deprecations reach certain amount of time. Set up dashboards (e.g. Grafana) to track dependencies. Help other teams update, and think about how to make updates less painful.

2. Define good boundaries and API contracts to adhere to. The most important thing for a good API contract is stability.

3. Don't prematurely split into microservices. Better ideas for stable API contracts emerge the longer you can wait.


The issue comes when business requirements don’t allow you to arbitrarily decide that your API should be deprecated. If it’s depended on by 10 teams who are doing higher impact customer facing work than you are, at some point you’re likely to hit a “no” when it comes to disrupting them.


How is that different in a monorepo? Would you be tasked in implementing the changes for those teams yourself? If so, why not just do that as PRs in their repos?


Good articulation of why things don’t quite play out that way within a company here https://rushjs.io/pages/intro/why_mono/


For that particular article, I would respond with the following:

1. I don't see how this is a problem specific to polyrepos. Have an "open PRs" link in the onboarding handbook that gives you a view of pull requests from all repos in the organization. GitHub automatically shows you notifications from all repos. If engineers still chose to focus on one or two repos after that, I'm not sure why.

- Have a (Grafana) dashboard where you can see the latest / newest stuff. Use standard GH tools you use for OSS, such as follows etc to keep up.

2. Don't prematurely split into multiple repos. "No monorepo" doesn't mean not having poly-package repos. It means thinking what the sensible (library or service) API boundary is - treating your projects as you would treat library / service development. In this case a separate repo with lib3, lib2 and lib1 sounds like a good way to go - at most one repo per orthogonal internal framework (e.g. core-react-components). Repo dependency chains should be as shallow as possible, and differenting between public and internal packages is important.

3. Help other teams upgrade. If you are responsible for repo A, once you publish a new version tagged appropriately with semver, use the dashboard to look at your dependants and work with them (or rather, for them) to upgrade. Think of your dependants as internal customers, and make sure you add enough value for them to justify the upgrade effort. Cultivate a culture that values updates.

4. There are other alternatives to `npm link` e.g. see `yalc` https://github.com/wclr/yalc

Another pet peeve of mine is that the real issues get lost when you try to generalize. The article attempts to do this but that makes it hard to evaluate its claims. The best way to evaluate (alternative) solutions is to take a more concrete example repo.

For scalability of this model I'll just point to the OSS community; individual maintainers often several dozen active repositories, but also they have an API contract worthy of a documentation website, versioning scheme and planned deprecation, and they typically avoid cross-project dependencies


> Individual projects should be able to upgrade dependencies separately.

No, that way lies sorrow and despair. There should only be one version of any dependency being used in your company. Any deviations should require like VP-level approvals or something.


It also helps keep people in sync with module and library versions. Without a monorepo, oftentimes what happens is that everybody ends up on different versions of libraries and modules. For every security update to a library, you'll have to campaign for everyone to pull that fix instead of everyone getting all fixes on every pull. It also forces dealing with incompatibilities between libraries early on when they happen instead of later when they are more entrenched and difficult to fix.


What I've seen in practice in smaller companies, is actually the other way around: everyone is afraid updating any library version because who knows what other team's work it might break. Theoretically, everyone must have extensive tests to cover everything, but in practice it's hard.

But when your team owns a repo, then at least the damage is contained within your team.

Google solves that problem with heavy NIH syndrome (its hard to get promoted by utilizing an external third-party lib, better develop your own), and writing tests, yes. And for those few third-party libraries that google still depends on, updating them is a big PITA.


It's not just NIH, after a certain size every external lib is a liability. You don't want to wait for some OSS maintainer to unblock your multi billion dollar business' issue.


If that were the case, you'd expect to see forking of 3rd party libraries rather than writing of new ones.


Most of the time you probably only use a small portion of the 3rd party library functionality so forking the whole thing and ripping out everything you don't use or maintaining it doesn't make much sense.


That can be handled with alerts and deprecations just like any other library dependencies


no it can’t, an alert won’t make a PM allocate time to “tech debt” and these alerts are rarely time free upgrades


If you have an issue handling tech debt due to PMs, you'll have that problem with other tech debt too; typically solved by having a better process to handle it


The problem isn't per se monorepo v multi-repo, it's ensuring that your lines of communication between components, ie your APIs, are shared reliably and coherently.

When teams take the stability & versioning of their APIs seriously, the need to use monorepos to share that info is greatly reduced. A multi-repo approach is perfectly feasible when all components are working to established APIs, which also alleviates the issues mentioned in the article.


I worked at Uber. The main arguments that I kept hearing were:

- easier to share and import common packages

- proto and thrift files are kept close to services and clients are updated globally automatically

- dependencies and go versions are managed globally and all services get the same security updates

- standardized build processes make it easier to manage large deployments

And honestly, after the initial repo download there were no visible downsides


How does not having a monorepo solve this problem? They would still need to compile and run the same code in order to run these tests between 1000 small repositories vs 1 large one, presumably they see a lot of value in end-to-end tests.

Is the suggestion just "an easier way to avoid this problem is to not have so many tests?"


If the change is specific to a single module, and the APIs exposed to other modules don't change (which can also be enforced with unit tests), you don't need to run the tests for every single repo for every change. Of course, one may argue that one could run module-specific tests in a monorepo as well. But that's not easy to do on a per-PR basis, so you just end up running the entire CI suite for every PR.


So you'd stop running tests for all of the calling code? It sounds like you have a higher level of confidence than I do that non-API changes can never break any downstream repositories in subtle ways.


that’s not actually what happens at Uber. They use bazel for the go monorepo which can accurately determine what modules are affected by a given change and only runs those tests.

They also use mostly homegrown CI tools alongside phabricator for code review, or at least they did while I was there.


I would argue it's impossible to know what you really changed without the integration tests.

But also, why is it any different why are changes at a repo level all that much easier to track than, say, inspecting changed files and running tests according to what changed?


We’re going to have monorepos until everyone tires of solving the problems they didn’t have before switching to this decade’s shiny object for workflow.


They used to have many small repos according to https://eng.uber.com/go-monorepo-bazel/ so perhaps there are advantages that outweigh slow CI.



First yes, but it's not difficult to imagine Uber has outgrown their first architecture.


Uber was the opposite. In ~2017 after years of hyper growth they had over 10,000 repos supporting nearly as many micro-services. The move to monorepos (technically two, one for jvm stuff, one for go) came as a solution to the maintenance issues of this sprawl. I was there when a mandate came down to move your service into the monorepo.


I don’t understand how it’s possible to have close to 10,000 services. I worked on a project with 17 and found that to be very awkward at times (local development with k8s didn’t seem to be a solved problem, and we depended heavily on complex tests and blue/green deploys to be sure changes actually worked in production). But nearly 10,000 is insane.

How do you orchestrate that? I guess this is the kind of situation where you really do need container orchestration.


It's really easy to innocently slip into a culture of "new project? new service!" and at Uber's scale, that could easily hit 10k.


Yeah, I’m starting to realize my experience in software is even smaller scale than I thought.


Repos aren't one-to-one with running services. Lots of these could easily be bundled as modules to be included at build-time. A lot of them could also just be API services that provide a contract via REST or whatever. If you have enough internal users, you just have to maintain an SLA comparable to a public API.


Both google and Facebook have monorepos, and I think apple does too. I’m not sure about the others, but I wouldn’t be surprised if they did as well


Exactly. And probably what Apple, Google and Facebook do is just the exact opposite to what the other 99% of companies out there need.

Who on earth can think that what they do, companies with the engineering power as these companies have, has to be also good for it's 30 employee startup just boggles my mind. Not talking about Uber, I have no idea about them and what they do. But I worked for smaller startups just blindly following what they read google or Facebook do and immediately thinking that's the best thing to do too.

It's ridiculous.


Google also has projects like this: https://github.com/google/breakpad

It supports 5 platforms, but uses 4 completely different build systems, including 2 custom ones (3 if you count depot_tools). There is very little overlap between the platform versions, meaning it's effectively 5 different projects smashed together into a single folder, and pretty much no way to use them in a cross platform project without some serious work. There isn't even a basic abstraction over the similar callback APIs between the platforms, although that's not a huge deal because the effort to write a basic abstraction layer is nothing compared to the effort of getting to a point where you can actually use it in a cross-platform project.

It's also funny that one of the build systems is GYP, which is basically a reinvention of CMake, except it's only used for the Windows build even though it can generate projects for the other platforms. Also, the VS project generator for GYP has been broken for a while (simple typo, trying to import OrderedDict from the wrong module. There's a PR to fix it, hasn't been merged for some reason), so it doesn't even work. Beyond that, it's also broken because GYP forces treating all warnings as errors, with a whitelist of warnings, yet the latest version (since yesterday at least) fails to build (tested on VS2019) because there's a warning that isn't in the whitelist.

You could try to fork it and fix these issues, but depot_tools doesn't provide a way to change the clone URL for repos, meaning you need to dig through the source code and wrap it in your own script that interacts with the internal APIs to do a simple clone (hint: fetch.py has a 'run' method that you can call with a custom constructed 'spec' object, which is a dictionary where you can inject your own url; just look at the hard-coded spec object for breakpad as a starting point). If you don't use depot_tools, then you need to manually clone all of the dependencies in the project since they're not even set up as git submodules.

There's also no versioning scheme whatsoever. Depot_tools seems to automatically checkout the latest version of everything (including itself).

I spent the past week wrestling with this monstrosity. Ended up successfully writing a Conan package for it that builds for Windows and Linux (there's one on Conan center, but it only supports Linux). I have 3 more platforms to go, but I think it'll be a better idea to just scrap everything and refactor into something more reasonable using CMake.

Instead of Breakpad, they also have a newer one called Crashpad, which is meant to improve reliability on Mac OS. Unfortunately, it depends on Chromium, so it won't work for my purposes.

...so all I'm saying is, maybe don't use Google as a role model for your project infrastructure.

/end rant


Crashpad doesn't depend on Chromium, in fact it uses mini_chromium (a mostly copy pasta of base) to avoid having that large dep:

https://chromium.googlesource.com/chromium/mini_chromium/

https://chromium.googlesource.com/crashpad/crashpad/+/refs/h...

What's the issue you're having with Crashpad? Indeed the breakpad project is a mess by modern standards.


To be honest, I haven't looked into Crashpad that much because (from a quick glance) it seems like it requires a secondary process for crash handling, whereas with Breakpad you can do it in the same process. That, and the chromium thing is what turned me away from it. I guess I should probably look into it a bit more, especially the size of mini_chromium to see if it's reasonable for my needs before I go forking Breakpad.


In addition to the other sibling comment, crashpad is built with GN, which is basically Bazel and overall much easier to reason about than GYP.


I've come to really disagree with this approach. It's extremely hard to undo an entrenched system. My org has a gnarly monolith that works well enough that we can't justify spending any money refactoring it as it enters it's 12th year of service.

"There is nothing more permanent than a temporary solution"


I worked at a company that went through this transition from thousands of microrepos to a monorepo. The main driver was application security. When everybody did their own thing in their own repo, rolling a new version of a library with a critical security fix was very expensive. It was impossible to complete in a reasonable amount of time, because you'd frequently find legacy services that were on some previous major version that required rewriting code or upgrading other libraries to implement the fix.

A monorepo means every service is on the same set of versions of every dependency at a given point in time. It's much easier to reason about what is fixed, and move everything to a known good version.


More fancy words on somebody's CVs?

I worked at a similarly large and almost as popular SF based company and I can assure you every innovation there was 100% motivated by somebody's desie to put.it in their CV. Nothing of what was made there made any sense for the business, it was just people playing with toys. And worst part was people introducing this stuff (such as the one pushing for Lerna and an "UI" monorepo would leave the company after 1 year leaving a mountain of tech debt behind. But hey, they had Lerna in their CV now.


In some ways do you blame anyone? The industry has largely not optimized for retention by and large


The article clearly stated that they can commit a cross-cutting changes (a change that can affects thousands of binaries downstream) within 15 minutes.

I haven't done any calculation on how long that kind of changes can be done with polyrepo setup.

If they can do that, their story can only be better for single isolated changes. It seems from outside, Uber's monorepo development experience vastly better than many small startups with variety of development methodology (no matter monorepo, polyrepo, or continuous deployment, or waterfall development, or GitFlow (a type of waterfall)).


> so I wonder what's the advantage they get in using a monorepo.

Much easier to apply horizontal and vertical changes which extend outside individual services e.g. update the interface of a service, or fix a bad pattern.


Some alternative solutions to similar issues:

Zuul (https://zuul-ci.org/) was created for openstack to solve the issue of optimistic merges / PR queue testing.

When you use buildkite with own containers on AWS ecs you can use efs to do a git clone with reference. (https://git-scm.com/docs/git-clone#Documentation/git-clone.t...) Essentially what they do with a packed base repo, but you only end up sending what you need, not more.

The binary cache is available in other flavours too. If you don't use go, then sccache (https://github.com/mozilla/sccache) may be useful.


Why the described CTC algo instead of querying bazel graph to deduce targets to build given changed files? e.g. https://github.com/bazel-contrib/target-determinator


Yeah I was wondering that. Isn't half the point of Bazel that is can know things like this without any extra effort?


That is what they do, click through the link in the article: https://github.com/bazelbuild/bazel/issues/7962#issuecomment...


It's amazing to see big company can throw so much engineering effort into it, while for majority of the CI users, just getting a 2x faster CI machine can achieve the same outcome with much less cost.

[0] https://buildjet.com/for-github-actions/blog/a-performance-r...

edited: wrong link


Speaking from experience working on CI at a large company, I'm sure they've "just got a 2x CI machine" about 6 times. At some point you can't just burn more money and you need to optimise


Computers are usually way cheaper than people. The difference would just be Cap Ex vs Op Ex. That being said, they must be burning zillions of cycles rebuilding code that hasn't changed by using a monorepo.


Maybe I misread the article, but I thought they used tools to ensure they only rebuilt parts of the dependency graph, not wasting zillions of cycles?


That’s how I understood it as well.


I've been thinking about this for a while, and it seems to apply to a lot of things; serving a HTTP request on cheap EC2 instances won't come close to doing it on a dedicated server with great single-thread performance.

So even though you can more easily horizontally scale and handle infinite requests, the latency of each request will be much poorer than if you were just running on better hardware.


Yeah in my experience "get a faster machine" is so easy it was always done years ago and is no longer a possible improvement.


fwiw there’s a cap to how much perf you can extract from an instance. we use R6 32cpu 64gb ram for our builders. we can’t really 2x that from a price point again lol


These companies often have thousands of machines in their CI fleets. It really can be cheaper to pay engineers to optimize rather than just buying more or bigger instances.


what makes you think they havent already done so? If they're running Jenkins and/or buildkite, they're managing their own runners so they're not jumping from GitHub actions runners to 8/16 core machines.


Alternatively don't use monorepos, and take advantage of binary libraries.

Naturally there is the caveat that using binary libraries isn't a thing in Go.


I don't see how your suggestion helps in the "I changed standard library and there are 10000 downstream test to rerun" use case, which is what they blogged about.

You either:

* Just don't test such changes at change time, break down and push the testing effort to downstream projects in disguise of "dependency management".

* Structure your engineering effort to avoid such changes, for example avoid having a base library shared between teams.

Neither is ideal for a corporation environment, because both harms velocity. We accept these in the "open source world" because we don't have better options, but the same does not hold for corps.


> Alternatively don't use monorepos

And now you get into the fun situation of having to handle ecosystemic asynchronous updates between your multiple repositories.


These are orthogonal concepts. If you don't have a monorepo, your CI can still build all downstream packages and redeploy changes on every update. If you have a monorepo, you can also ignore dependency changes and keep various services behind and in a broken-behaviour state if they'd get redeployed from main. Monorepo makes the sync updates slightly easier, but it's neither necessary nor forces it.


One commit in a monorepo now becomes N commits for every repo in every other repo


Only if you need to update all N other projects every time, which is unusual.

Our services aren't even particularly well-factored and we update median 2, mode 1, mean 3-4 when we change a core library. (Out of ~25.)


> Only if you need to update all N other projects every time, which is unusual.

This isn't true in my experience at large companies. For this to be true, you would either need to have absolutely no shared core libraries, or core libraries that are so stable and unchanging that they never receive updates. Updates to those core libraries usually result in a cascade of changes across a ton of repos, tons of tests that need to be rerun, and a slew of now incompatible versions.

All of these problems go away with a monorepo. Monorepos are absolutely the right solution for 98% of companies out there because they will never reach the engineering scale where monorepos start to struggle. For the 2% of companies that do reach that engineering scale, these kinds of blog posts get written.

The mistake is reading blog posts like this and thinking is necessary or even relevant to any more than 2% of the companies out there. The vast majority of shops will never need anything like this.


You seem to have ruled out the obvious choice of "downstream projects just don't update." We update the projects we need the new library behavior in. There's no reason to touch the other 20 until there's some new behavior they do need.

I don't really see this as a mono vs. multi repo issue either. In a mono repo you can still choose which projects update and the more you choose the bigger the diff and more involved the review process. In a multi repo case you can build analogous tooling to generate downstream MRs automatically.

If you need to touch every downstream project regularly just to keep things functional, you don't have a mono-repo, you have a monolith. (Which can also be fine, really! But it's not the same thing at all.)

(Yes, I know with 25 projects we're nowhere near Uber scale. But based on my experience at larger companies, this seems to scale more or less linearly - in terms of downstream project count, library count, and developer time available to dedicate to such things. If anything, developer time available is what seems to scale super-linearly.)


The "downstream projects just don't update" option is one I'd consider completely unacceptable. What if the update is a security patch? What happens when they inevitably do want a new feature, and have to fast-forward through months or years of interface updates all at once? What happens when a developer wants to work on a feature that cuts across systems, and now has to re-learn the "old" way of doing things?


> What if the update is a security patch?

Even a security fix is unlikely to affect every downstream user of a library except in egregious cases. But, if so, yes you update them all. (I'd say this is much less frequent than, I don't know, OpenSSL or log4j having a bug that makes us do this. The specific concern of an internal library having such a broad vulnerability is negligible in influencing our CI design.)

> What happens when they inevitably do want a new feature, and have to fast-forward through months or years of interface updates all at once?

You do it.

> What happens when a developer wants to work on a feature that cuts across systems, and now has to re-learn the "old" way of doing things?

You do it.

I didn't say "it has no associated costs." But you need to weigh those costs against the other costs of a monorepo, and other costs in your CI/CD design generally. "Take longer to update a really old project once a year" is a much lower cost for us than "have a dedicated CI team to wrangle the tooling we need for automatic downstream pushes / a monorepo."


Did you miss the "every time" qualifier? Everyone will have some core libraries, but most work will be very limited in scope. Outside of large companies you can see this effect in nixpkgs for example. See how few entries have a rebuild count larger than 10 https://github.com/NixOS/nixpkgs/pulls?q=is%3Apr+is%3Aclosed


> which is unusual

It's unusual because a multi-repo environment makes it difficult to do, not because it's not necessary.


It’s not necessary. As proof: we don’t do it, yet our services run.


I never understood the hand-waiving around multirepos. Parallel, distributed updates are extremely hard to get right in software, but, for some magical reason, not when checking in code across N separate repos with interdependencies.


A much better scenario from my experience, given that not everyone is on the last version anyway.


We know how to do that. Versioning, deprecations etc. We do that with all external dependencies.


All of which adds more effort, more synchronisation, more delay, more processes, and more ceremony.

> We do that with all external dependencies.

Indeed. Why you’d want to also suffer that for internal dependencies when there is no reason to, I can’t fathom.


I think this article clearly shows that the effort of handling a monorepo is way higher.

If you have a good process for external dependencies you can apply that to internal. If you don't have a good process for external dependencies a monorepo won't help (unless you make all your dependencies internal, which is even more expensive)


> If you have a good process for external dependencies you can apply that to internal.

Yeah... but usually good process for external dependencies requires a lot of ceremony and takes maybe days to update one dependency. If the same happens to internal dependencies we will be just slower at making changes for no good reason.

This is more of a cultural thing. In fact, we are creating complicated solutions to mitigate this problem, only to go full circle after a while: people implemented Service Mesh via proxy sidecars, it added more layers (than just embedding this into the RPC library), but even doing so is easier than persuading downstream project owners update their dependency in a timely manner, so it takes off. And now they are going to chase for "proxyless" due to "performance". Oh well.


Why would a complex dependency update be faster in a monorepo? In my experience the biggest savings are for simple updates that happen acorss multiple repos - when the update itself is complex, the extra time to make multiple PRs is dwarfed by the actual update's complexity, and it helps if the update can be separated into several projects where the version can be bumped separately.


You don't "update" the dependency. You are always building and releasing at head with all your internal deps (external deps are still versioned of course).

> the extra time to make multiple PRs is dwarfed by the actual update's complexity

Writing code is never the majority of the work. All the operational stuff is more complex without forced upgrades: coordinating the dependency updates, automating integration testing, pushing clients to migrate off old thick clients, determining what version of a specific repo is deployed, running a bisect to root cause issues, etc.


A lot of things could be done in a series of "simple updates that happen across multiple repos". Different style of doing the job I guess.

Also, having a rather global view of the impact of my code before actually make the change helps a lot, especially if you are working on low-ish level libraries. Technically this is not bound to monorepo, but people usually criticize the complexity introduced by having such ability for monorepo, so :)


The process for rolling out breaking API changes is the same for monorepos as it is for multi-repos since, during a deploy, multiple versions of each service will be running simultaneously. The only advantage of a mono-repo is the atomic commit across multiple services. It's definitely possible through a combination of convention and tooling to do something similar with a multi-repo, but as of yet this is a less explored paradigm.


As a Systems Engineer / SRE you’re damn right it adds more effort and delay. And it’s not because there’s “no reason” but because when you’re operating a production service at scale you need to put in the effort to make sure dependency changes won’t cause a massive outage or lose customer data. Automated tests and CI/CD only go so far and are not a cure-all.


Binary libraries are only relevant if a lot of time is spent compiling the library code. There's no data in the article that suggests this is the case.


Monorepos are not mutually exclusive with binary dependencies.


Quite common to hear comments like this, from people who have never worked at the scale in which the annoyances of monorepo are eclipsed by its advantages. I'm not saying anything particular to you, but it's quite hard to get to that scale in the first place, say global top 50 in big tech, so it's natural most folk can't understand why X company "still" runs monorepos.

At that scale, anything reducing complexity is a bonus. At that scale, monorepos are great for that.


Well, in those 30 years, I am certain that some well known international medical research laboratories, mobile operators and nuclear physics research centers fit the bill.


You can do both.


> CTC creates a Merkle-style tree...

You call it Merkle-style tree despite the fact it is obviously a DAG?

--

Edit:

> Large changes are also more prone to outages. If we land them outside the working hours, there would be limited resources to mitigate potential outages. To prevent this, we ended up asking engineers to get up early to deploy these changes

Huh, landing changes to HEAD and releasing to production are not decoupled?


> You call it Merkle-style tree despite the fact it is obviously a DAG?

All trees are DAGs so that's not really adding much.


But not all DAGs are trees and the one in the picture (and generally all graphs of dependencies) isn't.


It's a multirooted tree.

And describing it as a Merkle-style tree conveys additional implementation about the implementation - presumably the tree stores the hash ("computed from all of its source files and inputs") which just saying "it's a DAG" doesn't tell you.


Calling it a Merkle style DAG would have solved the issue.

Or just avoid the fancy names and call it what it is: they walk the dependency graph to identify build targets that have a changed dependency.


DAG doesn't seem like a useful identifier here since seems like a DAG encompasses many different types of trees, and in this case the specific data structure is closer to a Merkle tree than a generic DAG.


Their point is that they attach data to nodes (data(node) = hash(node + (data(c) for each child c of node))) and can therefore compare hashes at the roots[1] to quickly determine which dependencies have changed and whether changes clash.

Perhaps you’re complaining about it not being a tree because there can be multiple roots which seems fair enough.

[1] I think there are many ‘roots’ as roots are going to be leaf build targets like various executables or test results


They have a paper on that release process

https://eng.uber.com/research/keeping-master-green-at-scale/


If you read a history of Uber getting to monorepos [1], [2], it is obviously that they did it as an attempt to make their shared spaghetti code a bit more structured. Now they are writing articles about fighting a consequences of that move. For me the moral is: never go monorepo. It is at least counterproductive, but more like destructive even for large capable teams.

[1] https://eng.uber.com/ios-monorepo/

[2] https://eng.uber.com/android-engineering-code-monorepo/


Build time has always been essential, even more so important since the wide adoption of TDD

Picking a language that has fast build speed, allows you to be more flexible, and able to test and deploy quicker

They had Go, wich builds very fast, but they failed to educate their developers to maintain a healthy build pipeline to avoid things getting too slow

This kind of developer will cost your company millions, make sure to educate them properly ;)

I'd never work for a company with slow build speed, part of the reason why i refuse to touch languages like Rust and C++, due to their insanely slow build speed, i refuse to live in a world like that


Recently switched from a Go shop to Elixir. Ooof, the compilation feels like molasses.


Have run xref at all? I also work at an Elixir shop with excruciating compile times. There are many files where if you save them, 80+ others are recompiled. I was able to fix a ton of them but they just crawled back up again. There’s an underlying issue that we’re still working on but I can with high confidence that the culprit is cyclical deps.


Is it just me or most of those "monorepo" articles are about how the company switches to monorepo then just has to fight all the problems that come up with it?

Use multiple repos. It doesn't hurt you know. And you can have proper granularity

The costs of monorepo seem to be much higher than just dealing with inter-dependencies. And you can always group projects if it makes sense


Obviously the articles will be about solving the problems. Nobody wants to write a "yep, the thing that worked still works well for like 99% of cases (we solved the 1% in other ways)". This of course produces bias for people reading the posts, because it's not trivial to tell if the solved problem was about to bring the company down or was just an annoyance that they solved in a cool way they want to share.


I've never seen an article from a large company saying "we use multiple repos and it's great". Setting as all large companies seem to end up monorepo, I'm guessing in practice it is better.

I'd be happy to see a counter-example, but I've not seen one yet.


Amazon I think is the notable exception to the monorepo preference at large companies. But they have no lack of their own custom tooling to make multiple repos work.


Nobody is going to write a blogpost "we use multiple repos and it's ok for us". And that is the majority of companies.

Heck, even most individual developers use multiple repos.

Monorepos are usually ok (and even talking about team/project granularity) until you outgrow it.

Buildsystems like one-repo-per-project much better than monorepos. (Shipping a fix doesn't require rebuilding "the whole world" for example)


At sufficient scale, the problems of a monorepo tend to be more of a fixed overhead cost whereas those same problems at scale in a multi-repo approach tend to impact all engineers across the company. The fixed overhead can be addressed by hiring a dedicated team to manage the monorepo problems.


Not like there aren't plenty problems with (and articles about) working with multiple repos. Neither is a silver bullet, you just get to pick which set of problems you prefer to have to solve (or which ones you think you can get solved due to human factors)


> Is it just me or most of those "monorepo" articles are about how the company switches to monorepo then just has to fight all the problems that come up with it?

Seems to me that Git wasn't thought out for "monorepos" at first place as a VCS. Also, these companies should either use something else or not do monorepos.


Real world experience at the scale Uber works with, directly contradicts this, in my opinion, naive approach.


How do go monorepos work? Does each library/binary have its own directory with its own go.mod file? And then you need a build system more powerful than go build to descend into each directory and build (if necessary)?

Does this play nice with gopls (go language server) allowing you to jump around to definitions?


Afaik Uber uses Bazel to build everything so it’s not standard go toolchain but it’s using parts of it. And yes as of recently it decently integrates with gopls https://github.com/bazelbuild/rules_go/wiki/Editor-setup


Monorepos have one go.mod file. Bazel or similar build systems optimized for monorepos usually are used.


Doesn't bazel already calculate a minimum build set based on changed files?


Phabricator is no longer maintained, from what I remember and is used most of the time with svn. I 'm curious why they opted for it in the first place, instead of a more modern platform.


The open source version of phabricator is no longer maintained, but that only happened 1 year ago. Facebook's internal phabricator is still maintained and there is phorge which is attempting to fork and maintain the open source version of phabricator. Uber may opt to just maintain phabricator themselves.

>is used most of the time with svn

It supports mercurial, git, and svn.

>I 'm curious why they opted for it in the first place, instead of a more modern platform.

It is a modern platform and works well.


Facebook's "phabricator" is very different from OSS, to the extent that at this point probably most of what's left is terminology and architectural boundaries. The diff (PR) UI was completely rewritten; I think repo browsing too. Most of the storage got refactored to use common storage code.


This sounds more and more like:

„We solved an issue we created ourselves”.

Which sometimes is ofcourse what you have to do.


Not an Uber engineer, but I worked at another company with a lot of ex-facebook senior engineers in the founding team. They were already familiar with the Phabricator workflow, so they stuck to what they knew. My guess is that the early Uber team also included a lot of ex-facebook engineers.

I was new to Phabricator, but I picked it up quickly. Even though it might not constantly pump out new features, Phabricator gets the job done.


I'm currently working with Phabricator too, but in our case it was chosen a long time ago, before other platforms were popular


> Phabricator is no longer maintained

Only very recently. Obviously after Uber adopted it.

> from what I remember and is used most of the time with svn

Nope.

> I'm curious why they opted for it in the first place, instead of a more modern platform.

Because it's quite good? It's not some archaic thing like SVN for which there's an obviously better newer option.


FWIW, Mozilla also uses Phabricator for Firefox development.


This is a poor article with little applicability outside of Uber. There's a reason it's gotten no comments the previous three times it was posted.


I would disagree; they showed the problem they had, how they analysed the data to understand the problem, how they used that data to identify possible solutions, how they selected the solution, and how they implemented an tested the solution. There are other organisations apart from Uber that have similar issues (although they are probably small in number). But there is a lot to learn from understanding their approach.

Having said that it is much harder to learn from others' mistakes than your own. Learning from your own success is hard too. Learning from others' success is the hardest by far. At least we have a glimpse (albeit indirectly) in this article of the mistakes they made.


I guess I should also say, if you're looking for a practical how to document, this is not that (nor does it need to be).


If you use monorepos and at the same time you use microservices, what you are doing is called a distributed monolith.


This is a common mis-conception that the application design and/or deployment is tied to the repo or repos. They are completely orthogonal. Micro services can be done in mono or multi repo. A distributed monolith can be done in mono or multi repo. Either can also be done without version control.


Microservices are meant to decouple teams at both the deployment (runtime) level _and_ the source (repo) level. Microservices + monorepo is at least at some level self-subversive.


Agree with the spirit of this observation, but at the scale we work with at Uber, at comes a point in which "too much" decoupling starts biting you in the ass. Sounds kind of bizarre actually, but this is a particular characteristic of a microservice architecture after it's reached a certain amount of elements in the network and certain amount of complexity.

I can't imagine deploying a functionality that needs to interact with 20 other services, if I can't have some zero-cost assurances like Go's typing and having all IDL available at developing time. This is of course is even better if at compile time I can check all contracts are still honoured. Monorepo is a really "cheap" way of getting all this for, close to, free.


Your example is the clearest indication in the whole discussion that Uber has a distributed monolith.


> I can't imagine deploying a functionality that needs to interact with 20 other services . . .

If a product feature X can't be delivered without changes to 20 services, the architecture is too granular.


Agreed, but I didn't mentioned changes, I mentioned interaction. Reading data in this case.


You're right that monorepos make it easy for contracts that define wire protocols to stay "in sync" between services, at the code level.

But the ~main whole point of defining distinct services which exist on different ends of a wire is so that they don't need to stay in sync at the code level!

As you say, a service implements a contract, written as an IDL or a JSON schema or informal convention or whatever, in order to express a promise to consumers. That promise needs to be kept as long as any consumer relies on it. If your search service publishes a protobuf that has a service definition called e.g. SearchV1, then your search service is absolutely obliged to keep supporting that SearchV1 service until it's no longer used by anyone.

If this isn't the case, and you can deploy changes to a service that violate the wire-protocol contract you establishes with your consumers, then this is a problem. And it's easy to do. I find that many programmers don't fully understand the weight and implications of the interface they define between client and server.


I'm saying using BOTH monorepo AND microservices is a clear signal you're building a distributed monolith.

What you say is also true


Ah I see, I could definitely see that being a useful heuristic. Also it seems like the origins of micro services comes from places like Google that use a mono repo. The proof is in the pudding in the end I suppose.


This is just terminology. Services are still deployed and scaled independent from each other and have a codebase which can be maintained by a two-pizza team.


TLDR: How company wide monorepo/monodeployment multiplied by 10 the CI times, but we managed to halve that.


Well, it's not that easy. Let's say you have this layout :

- applications: front, backend, background-worker

- libs: database-orm

In a multi-repo layout, if you want to make a change in database-orm, you'll make your PR in its repo, test, and make a release with your changes.

Nice and easy right ? Well, you're not done. Now you have to make a PR to update the dependency on each repo using this library. If you're lucky, nothing breaks and it's quite quick.

But it's not always so easy : you notice that you actually broke something down the line in the backend. You have to fix it (in your library), and do it all over again. You can also have libraries depending on other libraries, multiplying the effort when you messed up something.

The monorepo handles that, you update one library and you can see what you broke down the line, and fix all of that quicker. Also, changes are easier to follow since one modification impacting several applications or libraries can be made into only one commit.

You can tell me that libraries should have a nice definition and the applications should be independent from the actual internals, but that's rarely the case. It's a tradeoff and lots of companies are going this route.


I can assert that from what I have seen during the last 30 years, I rather bet on multi-repos and binary libraries than monorepos.

The processes you describe are great to make teams think twice how to go at a change, instead of "let's change everything and see how it goes" attitude.


That time spent thinking of how to plan a change goes away when you have a monorepository. You literally don't pay that cost as you can work out the change fully locally.

I built a tool to bring up every microservice locally so you could test every change together so the separate repository problem went away. It coordinates vagrant LXC environments.

The version I built at my employer was integrated with chef and Ansible. The version I built at my employer handled cloning and pulling dependencies too. And could build in parallel and deploy to local load balancer haproxies.

But my open source version is barebones by comparison. In the future I shall try build developer tools open source rather than trapping work at my employer.

HTTPS://GitHub.com/samsquire/platform-up


>That time spent thinking of how to plan a change goes away when you have a monorepository

Not entirely. Since the fleet isn't atomically updated to the next version you have to be careful about multiple versions being compatible with each other.


Upgrading software and Postgres databases is a difficult problem in general.

The text itself can be changed but it takes at least 15 minutes to mutate to a deployment. We still cannot generically mutate running code to other running code. Ksplice and other live kernel patching and Chrome's binary patching should be generalised and productised.

Patching methods at runtime is another thing that is possible (ruby, python and Erlang) but I am not aware of a general framework for deploying mutations to servers at runtime.


Thats whats called „dependency management” and it should be built into your CI/CD, in majority of cases (non-breaking changes) after one repo PR merge you create a feature branch in the second one with new dependency already in place and all you have to do is apply feature changes you are working on.

But yes - GitFlow or Environment Branch HELL is so common in the industry - its hard to talk about better approaches.


Or you have a monorepo and you make the change in your backend, then you make the change in your dependencies, then you have one commit, one PR, one test run, and one release.

The solutions to monorepo taking 10x longer to build seem to be "Spend 10x longer on the manual developer parts doing 10x more procedure with more tools, more rules, more work"


Sounds good, doesnt work.

One place change == X possible conflicts.

Ten places over longer period of time (a lot bigger PR) == XXX possible conflicts.

All depends on the size of the team and project.

Tho its hard to find a good place to switch from one approach to the other one.


You assert that it doesn't work however we are in a thread where a major company like Uber has switched to it making your assertion ring hollow.

I do not understand your point about conflicts as 3 commits across 3 projects with 3 different CI pathways is going to cause more conflicts than 1 commit across 1 repo. In my experience managing one code change or project across 3 repos is a 10X difficulty increaser in terms of repo management, conflicts, etc. It's not just 3X harder, it's 10X harder to me. The number of times I've seen a spelling mistake/naming difference/etc in 1 out of 3 repos because the PRs were done separately and no one noticed is too damn high.

The simplicity of having it all together strongly outweighs the benefits of multi repo in most situations IMO. The number of projects/companies/etc that would benefit from some highly engineered microservice-based multi-repo monster is probably less than 100 in my country, and 1000 worldwide.


The issue is pretty simple. If you have 100 ppl working on the same directory, the chance of them conflicting each other is ALOT BIGGER than 100 ppl working on 10 different directories. It's a simple probability.

This is the issue with monorepos -> That over time the more ppl work on them, the higher the chance of the CICD to fail due to conflicts. And till now pretty much NOONE in the world fully resolved the conflicts issue.

Companies introduce merge queues which cripple productivity, because every next merge "can" (but doesn't have to) break all previous PRs.

Instead of having a very easy Feature-Branch pipeline, you have to build some abomination that becomes a bottleneck as soon as the company starts growing.

--- Like I said in other comment - majority of IT still lives in GitFlow/EnvFlow hell. Some of us learnt from those mistakes and do better now.


What you are describing as the good parts sounds to me as what you get from a traditional monolith.

Why build libraries and separate services if an update on some part of the codebase needs changes across all the platform?

This trend of microservices + monorepos sound to me like taking the worst part of everything.

If you do microservices, then let each team have their own stack, their own tools, their own libraries and be independent and just agree on the APIs.

If you want everything super consistent and share as much code as possible, etc then just do a traditional (well architected) monolith, maybe deploy it with different configurations for different scaling needs, etc.

Monorepos are to me a symptom of worse problems.

I worked for a.company where they had a rule that "every repository should be a monorepo". They didn't even understand everything that was wrong with that rule.


Imho, with mono repos you can achieve all the benefits of a monolith with the benefits of a micro service architecture.


Except you're now calling functions across the network. And no more complete tracebacks. And need to deploy 10 things at the same time or everything falls apart.

Yes, you get the benefits, but also all the drawbacks. Those drawbacks might make sense for extremely large teams. I want to cry every time I see a 20 engineers teams wasting company's money with this.


Yes. So if you have to do micro services. Better do it from a mono repo.


> But it's not always so easy : you notice that you actually broke something down the line in the backend. You have to fix it (in your library), and do it all over again. You can also have libraries depending on other libraries, multiplying the effort when you messed up something.

Since this is Go, it's trivial to point in-development branches of downstream projects to an in-development branch of a dependency before merging it for those projects' testing processes. (This still doesn't cover 100% of cases of course, but nothing does. The better answer is not to over-factor in the first place.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: