It's always so enlightening to have articles like this one shed light on how companies at scale operate. It goes without saying that many of the problems Stripe faced with their monorepo isn't application to smaller businesses, but there are still bits and pieces that are applicable to many of us.
I've been working on an ephemeral/preview environment operator for Kubernetes(https://github.com/pier-oliviert/sequencer) and as I could agree to a lot of things OP said.
I think dev boxes is really the way to go, specially with all the components that makes an application nowadays. But the latency/synchronization issue is a hard topic and it's full of tradeoff.
A developer's laptop always ends up being a bespoke environment (yes, Nix/Docker can help with that), and so, there's always a confidence boost when you get your changes up on a standalone environment. It gives you the proof that "hey things are working like I expected them to".
My main gripe with the dev box approach is that a cloud instance with similar compute resources as a developers MacBook is hella expensive. Even ignoring compute, a 1TB ebs volume with equivalent performance to a MacBook will probably cost more than the MacBook every month.
Wouldn't this be a reasonable alternative? Asking because I don't have experience with this.
1. New shared builds update container images for applications that comprise the environment
2. Rather than a "devbox", devs use something like Docker Compose to utilize the images locally. Presumably this would be configured identically to the proposed devbox, except with something like a volume pointing to local code.
I'm interested in learning more about this. It seems like a way to get things done locally without involving too many cloud services. Is this how most people do it?
We do this, but with k3s instead of docker compose (it’s a wonderful single-box substitute for full k8s), and a developer starts by building the relevant container images locally. If everything works, it takes about 3 minutes to get to a working environment with about a dozen services.
We steer clear of proprietary cloud services. In place of S3, we use minio.
The one sore spot I haven’t been able to solve yet is interactive debugging. k8s pushes you toward a model where all services need a pile of environment variables to run, so setting these up is pretty painful. In practice, though, all rapid iteration happens inside unit tests, so not having interactive debugging isn’t much of a productivity drag.
At least on a MacBook, docker is still a compromise in many ways since it has to run on a Linux VM (I live in the SF tech bubble where I've only ever been issued a MacBook).
Even on my personal Linux desktop, I don't love developing in containers. It is very tedious to context switch between my local environment and the in-container environment, and I don't even consider myself the type with a super personalized setup.
So I don't consider local docker that much of an improvement over a remote devbox.
> It is very tedious to context switch between my local environment and the in-container environment
I think, with my proposed setup, you'd still do development on your local machine. The containers would only be there for the dependencies, and as a shell to execute your code. The container hosting the application under development would use a volume to point to the files on your local machine, but the container itself (with all its permissions and configuration) would match or nearly match what you plan for the production environment.
I manage a dev environment for a small, inexperienced (but eager) team and I have a similar setup. I’ll do a write up at some point if I have time. It can work, and does for me, but there are some funny consequences can end up mediate the relationship between a developer’s computer and his code, which is a terrible place to be.
It’s $250/month for a c6g 2xl 1tb ebs on demand pricing go reserved instances. Given they use AWS and are a major customer, you can expect excellent pricing above the above public pricing quote.
Considering the cost of a developers time, and you can do shenanigans to drive that even lower, this all feels totally reasonable.
If you truly need that kind of perf (and at Amazon, we had plenty of dev desktops running on ebs without that kind of performance) then you should really opt for an instance type with local storage.
Ive been deep in to implementing macOS CI workers on AWS where that isn't an option (or rather, it is an option, but it is unsupported and Amazon only buys Macs with the smallest possible SSD for the given configuration). So your options are to pay an arm and a leg for fast EBS, or pay an arm and a leg for a pro or max instance with larger internal SSD.
The article didn't actually say what "Stripe's cloud environment" was, besides "outside of the production environment". I assumed the company had their own hardware but your assumption is more probable.
I find the devbox approach very frustrating because the JetBrains IDEs are leaps and bounds ahead of everything else in terms of code intelligence, but only work well locally. VSCode is very slightly more capable than plain text editor + sync or terminal-based editor over SSH, but only slightly.
It's darkly amusing how we have all these black-magic LLM coding assistants but we can't be reasonably assured of even 2000s level type-aware autocomplete.
I work in Go. My company keeps trying to push us onto their VSCode based remote environment. “Find all references” doesn’t, “go to definition” works maybe 30% of the time, and the Go LSP daemon needs to force killed dozens of times in a working session, taking several minutes to recover each time. The autocomplete suggestions are about 3x as likely to be from the VSCode fuzzy matching thing or Copilot slop as actually existing symbols that type check in context.
JetBrains Projector and Gateway meanwhile lock up or outright crash several times an hour; text input and scrolling are not smooth.
Right, dev boxes do not need to do double duty as a personal computer plus development target, which allows them to more closely resemble the machine your code will actually run on. They also can be replaced easily, which can be helpful if you ever suspect something is wrong with the box itself - if the new one acts the same way, it wasn't the dev box.
I don't recall latency being a big problem in practice. In an organization like this, it's best to keep branches up to date with respect to master anyway, so the diffs from switching between branches should be small. There was a lot of work done to make all this quite performant and nice to use. The slowest part was always CI.
I feel like we're not getting the right lessons from this. It feels like we're focusing on HOW we can do something versus pausing for a brief moment to consider if we SHOULD in the first place.
To me the root issue is the complexity of production environments has expanded to the point of impacting complexity in developer environments just to deploy or test - this is in conjunction with expanding complexity of developer environments just to develop - i.e. web pack.
For very large well resourced organizations like Stripe that actually operate at scale that complexity may very well be unavoidable. But most organizations are not Stripe. They should consider decreasing complexity instead of investing in complex tooling to wrangle it.
I'd go as far as to suggest both monorepos and dev-boxes are complex toolchains that many organizations should consider avoiding.
> I'd go as far as to suggest both monorepos and dev-boxes are complex toolchains that many organizations should consider avoiding.
I'm not sure "monorepo" means the same thing to you as it does to me? To me, it just means "keep all the code in one repo, instead of trying to split things up into different repos."
To me, it's the thing that is the simple solution, it just means "a repo" -- the reason it gets a name is because it's unusual for large orgs with enormous codebases to have everything in one repo, it's unusual for them to do the simple thing that works fine for a small org with a normal codebase.
What is it you're suggesting a simple organization should do instead of a "monorepo"?
> To me, it just means "keep all the code in one repo, instead of trying to split things up into different repos."
To me, and perhaps more from a Devops-like perspective, mono repo means "one repo many diverse deployment environments and artifacts often across multiple programming languages".
Im advocating against the Google/Stripe situation of a singular massive repo with complex build tools to make it function - like Bazel. I think sometimes small organizations get lured by ego and bad cost/benefit analysis into implementing such an architecture and it can tank entire product orgs in my experience (obviously not for Stripe, Google, etc.).
A monorepo doesn't require multiple programming languages or Bazel. But once multiple programming languages are involved, the complexity exists regardless of the chosen tooling. With multiple repos, that complexity is just pushed elsewhere like the CI system.
The argument would be that for simple organization, dividing things into independently releasable components is less simple than just having one app. I think that's what most simple organizations do, no? Why do you need the complexity of independently releasable components for your simple organization? Now you have to track compatibility between things, ensure what version of what independently releasable thing works with what version of what independently releasable other thing, isn't that added complexity? Why not just have one application, isn't that simpler? You don't need to worry about incompatibilities between your separately releasable things -- every commit that passes CI on your single repo means all the parts are compatible (sans untested bugs).
Usually it stops being "simpler" at a level of organizational complexity or code size where it becomes a mess. The "monorepo" is the attempt to do what everyone was just doing anyway for simple orgs with simple codebases, but keep doing it at huge sizes.
The monorepo vs many-repos discussion often hits upon so many implied factors, but it's only really about how source code is stored.
It doesn't necessarily indicate much about the deployment model. You can have many separately releaseable things in one repo, and you can have one independently releaseable thing based on the sources of many repos.
Monorepos enable, but don't require, source-level co-evolution. Or maybe a better way to put it would be: many projects can have a shared history. Many-repos require independent source-level evolution. In the open source world there is no real choice: every project wants to be independent. The authors of a given project can do what they wish with it.
One weird thing to think about is that monorepos can accomodate many-repo style workflows. You can still develop projects completely independently within a single repo. Of course you can store separate projects on separate revisions, which would be weird. An even weirder approach would be having all projects in a given revision, but have totally independent builds, no single-version policy, no requirement for atomic compatibility, et cetera. These are all things that are often imposed for monorepos, but that are also not requirements. Basically, you can treat each project as independent even if their sources are stored together. I don't think there are any reasons to actually do this, of course.
If you're living in the same dysfunctional world I am, then maybe your organization split things into repos that are separately releasable, but are conceptually so strongly coupled that you now need to create changes on 3 repos to make a change.
I think this is the key phrase in what you've written. Quite often I've seen teams insist on separating things that cannot be released independently due to some form of coupling.
You end up with people talking about a particular "release" but not really knowing 100% what's in it and then discovering later that something is missing or included by mistake.
IMHO it's much easier to keep it all in a single repo and use the SHA value as a single source of truth when discussing what's in it. I don't really work on huge codebases though so your mileage may vary.
> You end up with people talking about a particular "release" but not really knowing 100% what's in it and then discovering later that something is missing or included by mistake.
If your devs couldn’t be bothered to pin versions that was never a tooling problem. You don’t need a 500GB Git repository with every vendored component to know what’s in your code.
Equally, if your team is going to store 500GB vendored components it doesn't matter if that's all in one place or smeared many repos. You still have the same issue.
Absolutely, I worked on tech behemoths and smaller companies. The dev experience was significantly better when all development was local. I even worked on initiatives to move development away from the cloud, and although other devs were skeptical, they ended up loving it.
I think we don't have good solutions for scaling down prod.
Our relatively simple prod architecture has 5 containers & a hosted database (so 6 containers when run locally), and any less would impact our product goals.
I still find running prod locally valuable, and is the most common way anyone does development here, but containers are fairly heavyweight when you want to run everything on one machine. It's also impossible if you have parts that need special accelerators to get good latency, etc.
If you're willing to build everything from scratch, you can have a framework that seamlessly lets you build conceptual services and then separate the physical deployment concerns, like Google has and sometimes even uses. But for the rest of us where we're clobbering together a bunch of different technologies, that's a luxury we can't really afford.
I've been working on an ephemeral/preview environment operator for Kubernetes(https://github.com/pier-oliviert/sequencer) and as I could agree to a lot of things OP said.
I think dev boxes is really the way to go, specially with all the components that makes an application nowadays. But the latency/synchronization issue is a hard topic and it's full of tradeoff.
A developer's laptop always ends up being a bespoke environment (yes, Nix/Docker can help with that), and so, there's always a confidence boost when you get your changes up on a standalone environment. It gives you the proof that "hey things are working like I expected them to".