Under Deconstruction: The State of Shopify’s Monolith

lmarcos · on Sept 17, 2020

Great article. Main takeaway: microservices is not the only option when managing big codebases. In a parallel universe I imagine that the coolest trend in software development right now is a tool for monoliths: all code in a single repo, independent deployable components, contracts in the boundaries and mockable dependant components where needed. As opposed to our universe in which building microservices is the non-official way to go.

gen220 · on Sept 17, 2020

What you describe is microservices developed in a monorepo, and a lot of companies (including the one I currently work at) have gone this route.

Some people might disagree, but imo the cult of microservices does not require 1 repo per microservice.

The tools you describe are build-graph management tools (bazel pants buck etc) and rpc tools (gRPC + protobufs, cap'n proto) and they are indeed pretty cool, albeit to a niche crowd.

jurre · on Sept 17, 2020

I think the key difference here is that there is no network in between components in a componentized monolith, each component runs the entire “monorepo”

dodobirdlord · on Sept 17, 2020

Whether there’s actually network between components is something a platform team can handle based on their best judgement. Having collections of containers that always run together is a common pattern.

pbourke · on Sept 17, 2020

Certainly, but such a system is not a monolith. A core trait of the monolith is that there are no network calls between components.

inopinatus · on Sept 18, 2020

This negative-space definition of "monolith" is unhelpful to the point of irrelevance. It's unreasonable, in the sense that adopting it gives us nothing to reason about, as with the comment above. By such a standard the last monolithic in-service system was a Burroughs mainframe ca. 1975. I've got statically linked binaries that would fail this definition.

Even the plainest Rails application depends on network traffic, including to communicate with parts of itself. It cannot function without an operating system, which is also talking to parts of itself via network protocols, and this runs on a server whose internal buses are themselves a distributed system.

It's networks, all the way down, and a heads-in-the-sand attitude doesn't help us reason about performance, reliability, scalability, maintainability et cetera.

Put this in a "Falsehoods programmers believe ...": calling a stateless function in a stack-based language to compute an immutable result won't lead to a network call.

Monolithic applications are defined by something they are, not something they don't do, and what they are is a single unit of code for development and deployment purposes that includes everything necessary to fulfil an entire system's purpose. The issue of intentionally crossing a network boundary, and when, and why, is an dependent topic in comparative systems architecture, but it's analytically orthogonal.

thebean11 · on Sept 17, 2020

Is there really that big of an advantage to avoiding the network boundary though?

nthj · on Sept 17, 2020

Absolutely:

* Avoid network and JSON serialization overhead

* Perform larger refactorings or renamings without considering deployment staggering or API versioning

* testing locally is far easier

* Debugging in production is far easier

* Useful error stack traces are included for free

* Avoid (probable in my experience, at least in larger security software organizations) dependency on SecOps to make network changes to support a refactoring or introducing new components

If an organization is or will pursue a FedRAMP certification, as I understand it, that organization must propose and receive approval every time data may hit a network. Avoiding the network in that case may be the difference between a 50-line MR that's merged before lunch and a multi-week process involving multiple people.

gen220 · on Sept 17, 2020

FWIW, I think that gRPC/protobufs have pretty compelling answers to each of the historically-valid complaints you've listed here.

- cpu cycle overhead: this is valid if the overhead is very high or very important. otherwise, most companies would love to trade off cpu cycles for dev productivity.

- refactorings/renamings without deployment staggering. protobufs were specifically designed with this in mind, insofar as they support deprecating fields and whatnot. However, writing a deprecatable-API is a skill, even with protos. If you have many clients and want to redo everything by scratch, you will have problems.

- "testing locally" (which I take to mean integration testing locally) is the only one that requires some imagination to solve, assuming all your traffic is guarded by short-term-lease certs issued by vault or something similar. But even this is quite achievable.

- error stack traces included for free: may I introduce you to context.abort(). It's not a stack trace by default, but you can actually wrap the stack trace into the message if you so-care to. opentracing isn't quite free, in a performance sense, but in a required-eng-time-to-setup-and-maintain-sense, it is pretty cheap.

- dependency on secops to make network changes: I've never encountered this, but I bet you that a good platform team can provide a system where application teams effectively don't need to worry about this. It's impossible to overcome this challenge in an existing company that's used to doing things this way, though.

mpweiher · on Sept 18, 2020

> cpu cycle overhead

The original poster's point was CPU and network overhead. A local procedure/function call or message-send takes on the order of one or up to a few nanoseconds. Depending on how you organize things, an IPC is going to be in the microsecond or even millisecond range. That's a lot of orders of magnitude. It's also latency that you just aren't going to get back, no matter what extra resources you throw at it. [1][2]

In the early naughties, a rewrite of very SOA/microservice-y BBC backend system I re-architected as a monolith became around 1000x faster. [3]

In addition, in-process calls are essentially 100% reliable. Network calls, and various processes attached to them, not so much (see [1], again). The BBC system not just became a lot faster, it also became roughly 100 times more reliable, and that's probably low-balling it a bit. It essentially didn't fail for internal reasons after we learned about Java VM parameters. And it was less code, did more, and was easier to develop for.

[1] https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

[2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...

[3] https://link.springer.com/chapter/10.1007%2F978-1-4614-9299-...

gen220 · on Sept 18, 2020

Ah gotcha, thank you for locking-in on the issue. You're absolutely right that network hops introduce overhead (I was intending to wrap i/o blocking on network calls under the banner of cpu cycles, adjacent to serialization)

Like any other design decision, there's a trade-off here. (see my other comments in this tree, about how many 9's in reliability/latency you're targeting).

If you're working in an environment where sub-5ms latency to the 4th or 5th 9 is critical, inter-machine communication is not for your application, period.

Reliability, as an orthogonal concern, is one that has improved incredibly since the early aughts. The "transport" and error-handling layer of open-source RPC frameworks has improved by orders of magnitude. I'd recommend taking a long look at the experiences of companies built on gRPC.

It's much easier to build a reliable SOA-esque system today than it was even 5 years ago. It's been an area of rapid progress.

mpweiher · on Sept 21, 2020

Yes, obviously these are trade-offs.

However, I find the way you framed these trade-offs decidedly...odd, in terms of "who needs that kind of super-high performance and reliability????", as if achieving that were only possible through herculean effort that just isn't worth it for most applications.

The fact of the matter is that a local message-send is also a helluva lot easier than any kind of IPC. Also easier to deploy, as it comes in the same binary so is already there and easier to monitor (no monitoring needed).

So the trade-off is more appropriately framed as follows: why on earth would you want to spend significant extra effort in coding, deployment and monitoring, for the dubious "benefit" of frittering away 3-6 orders of magnitude of performance and perfect reliability?

Of course there can be benefits that outweigh all these negatives in effort and performance/reliability, but those benefits have to be pretty amazing to be worth it.

gen220 · on Sept 22, 2020

> as if achieving that were only possible through herculean effort

I encourage you to reread my comments, I'm not suggesting anywhere that high-performance requires exceptional effort.

In fact, I'm actively admitting that for applications where high-performance is required, IPCs/RPCs are not an option.

> just isn't worth it for most applications

Performance is valuable, but it's one dimension of value.

My premise is that, given the maturity of RPC frameworks and network tooling in 2020, most already-networked applications can afford to trade the performance hit of additional hops on the backend.

Whether what you get in exchange for that performance hit is valuable?

That is mostly a function of the quality of your eng platform.

> a local message-send is also a helluva lot easier [on the programmer?] than any kind of IPC

This strongly depends on your engineering org, although it seems like this is the point that's hardest to imagine for some people.

If you're on a team that depends on the availability of data maintained by N other teams,

(given the maturity of RPC Frameworks and network tooling in 2020, again)

It is much easier to apply SLOs and SLAs to an interface that's gated by an RPC service.

> spend significant extra effort in coding, deployment and monitoring

The extra effort here is made completely negligible by the existence of a decent platform team.

FWIW, I wouldn't be able to imagine it if I haven't experienced it myself.

> benefits have to be pretty amazing to be worth it

I still think you're overestimating some of the costs (see above). FWIW, I've worked in an RPC-oriented environment for years now, and reliability has never been a concern. Our platform team is pretty good, but we are not a Google-esque company (200 engineers, including eng managers)

The performance trade-off has been demonstrably worthwhile, because we've used it to purchase a degree of team independence that would not have been otherwise possible.

mpweiher · on Sept 22, 2020

>In fact, I'm actively admitting that for applications where high-performance is required, IPCs/RPCs are not an option.

But you're framing it as "...for applications where high-performance is required", as if taking the performance, expressiveness and reliability hits should obviously be the default, unless you have very special circumstances.

My point is, and continues to be, that it's the other way around: you should go for simplicity, reliability and performance unless you have, and can demonstrate you have, very special requirements.

lmm · on Sept 18, 2020

Thrift or protobuf is a huge step up from the alternatives, but you still have a lot of overhead. Generics are limited and you're essentially forced to "defunctionalise the continuation" everywhere: any time you want to pass a callback around you have to turn it into a command object instead.

gen220 · on Sept 18, 2020

I don't disagree with you, this actually sounds like the beginning of a super interesting conversation.

Can you share some examples of the generics problem and "defunctionalizing the continuation"?

Does google's `any` package help with the generics problem you describe? (Acknowledging that it's obviously clunky)

lmm · on Sept 20, 2020

> Can you share some examples of the generics problem and "defunctionalizing the continuation"?

Well, the generics problem is that you don't have generics. So you just can't define a lot of general-purpose functions in gRPC, and have to make a specific version of them instead. Even something like "query for objects like this and then apply this transform to the results" just can't be done, because there's no way to pass the transformation over the wire, so you have to come up with a datastructure to represent all the transformations that you want to do instead. "Defunctionalizing the continuation" is the technique for doing that, https://www.cis.upenn.edu/~plclub/blog/2020-05-15-Defunction... is an example, but it's a manual process that requires creative effort each time.

> Does google's `any` package help with the generics problem you describe? (Acknowledging that it's obviously clunky)

Not really, because you don't have the type information at compile time. Erased generics are fine in a well-typed language, but just using an any type you can't even do something like: a function that takes two values of the same type.

vosper · on Sept 18, 2020

People who are downvoting the parent comment: I’d love to know why? I won’t claim expertise here, but it doesn’t strike me as clearly incorrect.

closeparen · on Sept 17, 2020

How are you getting around API versioning with independently deployable components?

sokoloff · on Sept 18, 2020

If you call a piece of functionality from your own single deployable that you are refactoring, it’s much more like refactoring a function call than if it were an independent micro-service across a network.

heavenlyblue · on Sept 17, 2020

What is a network?

Spivak · on Sept 18, 2020

Any application boundary that requires that you serialize your calls/requests to the other service/component in some form.

Any form of IPC basically.

gen220 · on Sept 17, 2020

I think there used to be, before "off-the-shelf" RPC frameworks, service discovery, and the like were mature. There still are, for very small companies.

In 2020, if you have an eng count of >50: you use gRPC, some sort of service discovery solution (consul, envoy, whatever), and you basically never have to think about the costs of network hops. Opentracing is also pretty mature these days, although in my experience it's never been necessary when I can read the source of the services my code depends on.

Network boundaries are really useful for enforcing interface boundaries, because we should trust N>50 programmers to correctly-implement bounded contexts as much as we trust PG&E to maintain the power grid.

That being said, if you have a small, crack team, bounded contexts will take you all the way there and you don't need network boundaries to enforce them.

twunde · on Sept 17, 2020

It depends on your speed requirements and whether calls are being sent async or not. Also keep in mind that even with internal apis, an api call is usually multiple network boundaries (service1 --> service2 (potential DNS lookup) --> WAF/security proxy --> Firewall --> Load balancer --> SSL handshake --> server/container firewall --> server/container). Then you get into whether the service you're calling calls other apis etc. You can quickly burn 50ms or more with multiple hops. If you're trying to return responses within 200ms you now have very little margin.

gen220 · on Sept 17, 2020

Acknowledging that there are indeed many hops, I think it might be a bit disingenuous to say 50ms is easy to burn, depending on what p-value we're talking about.

IIRC, a typical round trip service call at my current place of work (gRPC with protobufs, vault/ssl for verification, consul for dns, etc) carries a p99 minimum latency (i.e. returning a constant) of around 2ms.

A cold roundtrip obviously takes longer (because DNS, ssl, etc).

It depends on how many 9's you want within 10ms, but there are various simple tricks (transparent to the application developer) that a platform team can apply to get you there.

As a sidenote on calling other APIs, my anecdata suggests that most companies microservice call graphs are at most 3-4 services deep, with the vast majority being 1-2 services deep.

This doesn't show the call graph, but it does demonstrate how many companies end up building a handful of services that gatekeep the core data models, and the rest simply compose over those services: https://twitter.com/adrianco/status/441883572618948608/photo...

jurre · on Sept 17, 2020

It depends, but it means you don’t have to serialize/deserialize data, deal with slow connections, retries, network failures, circuit breakers etc

djohnston · on Sept 17, 2020

I agree 100%. It gives you the boundaries but also the whole world maps to a single revision in VCS

didibus · on Sept 17, 2020

Ignoring some of the deployments and dependencies related aspects of microservices Vs monolith, one aspect that has me convinced against my own ideals is that a micro-service has a "strong" boundary in that it is actually difficult and effortful for a developer to cross it.

This in turn has a positive effect in maintaining proper boundary and putting the right amount of thought about the interfaces and responssability of each component.

exterm · on Sept 17, 2020

It's all tradeoffs. You get a stronger boundary, but you also get a distributed system.

Also, the first try of drawing boundaries will always be varying degrees of wrong. If you have very strong boundaries at this stage, iterating on them, moving responsibilities around, can be harder.

Also, with the right tooling it's definitely possible to harden monolith internal boundaries to a comparable level.

I can see though how many smaller companies would not be in a position to build that tooling.

Anyway... there is no either / or here, as I've explained in another comment. What if you have components within a monolith, but each component has its own database, for example? What if test suites are completely isolated, so that tests for component A can not access code in component B?

You can get pretty strong boundaries with a few comparably simple tricks.

lmm · on Sept 18, 2020

A decent module system can achieve the same thing without all of the network overhead. Even just something like maven multi-module projects makes it hard enough to accidentally cross the boundaries.

yeswecatan · on Sept 18, 2020

Agreed 100%.

hliyan · on Sept 17, 2020

I often find myself saying "never do at runtime what could be done at compile time".

exterm · on Sept 17, 2020

you don't work with Ruby eh? :D

WJW · on Sept 17, 2020

The lack of compile time in the ruby world really makes it difficult to do a lot of work there. :P

There's a nice Ruby trick btw where you put significant precalculations in constants, since the value of a constant gets computed during program startup it still allow you to do work "up front" instead of during a web request.

mhoad · on Sept 17, 2020

I never knew that but that IS a cool trick.

skipants · on Sept 17, 2020

I just want to caveat this as it is not a Ruby construct, it's part of Ruby web servers. Because they are long-living Ruby processes they are only loading files once (I suppose it's similar to compiling). This means it runs all globally-scoped code, which class definitions and constants (generally) are. That's actually what Ruby bootloaders like Spring and Zeus are doing on your dev machine to speed up the load time when you use Rails commands. They cache all that globally run stuff in their own process. It's also why they run into a bunch of issues when you have logic in your constant definitions.

owyn · on Sept 17, 2020

Yep, that's a good trick. At a previous PHP shop we had a large amount of static XML configuration (well it was generated, but not that often). Converting it all to PHP arrays and including it was significantly faster than parsing the XML on each request, and then PHP would cache that result too. Re-running the XML->PHP tool just caused it to re-include/cache these giant arrays of static config. It worked great. I mean, arguments about whether that was a good design or not aside...

(edit to reply since I can't reply to a reply to a reply)

Yep, it is very common in lisp/smalltalk environments to dump the state of the world to disk and re-load it later. This is one of those tricks that gets relearned every generation. :)

For bonus credit apply this analogy to docker images. :)

JohnBooty · on Sept 17, 2020

I did this in PHP once as well. I had to code a coupon lookup site where people entered coupon codes and they were verified against a database. I forget how many coupons there were... pretty sure it was less than 100,000.

Anyway, I coded it up in my local dev environment. Unfortunately, it turned out that I'd been mislead and the actual deployment environment didn't have a database server available.

In desperation and facing a deadline, I dumped all the lookup values into an array in a PHP file. As you said, it was really quite performant. The first request after starting the server was a bit slow (but not too bad... still < 10 seconds I think) and after that things were golden.

I felt a bit dirty, but things worked and we got paid.

im3w1l · on Sept 17, 2020

I heard emacs did that but went one step further, they dump the memory of the interpreter post-init* and just load it into memory when starting.

* Some early step in the init process. Many things are still interpreted at init.

andreareina · on Sept 18, 2020

"emacs unexec" is the search term if you're looking for it.

https://news.ycombinator.com/item?id=21394916

https://lwn.net/Articles/707615/ "The Emacs dumper dispute"

https://lwn.net/Articles/673724/ "Removing support for Emacs unexec from Glibc"

im3w1l · on Sept 17, 2020

I heard emacs did that but went one step further, they dump the memory of the interpreter post-init* and just load it into memory when starting.

* Some early step in the init process. May things are still interpreted at init.

shawnz · on Sept 18, 2020

What? Don't global variables exist in pretty much every language on the planet? This isn't a "trick", it is a bad practice which should be avoided in any language.

Imagine describing globals in C, Python, or JavaScript, or static fields in Java or C# as a "neat trick"...

aantix · on Sept 17, 2020

In Ruby, the class definition is code as well.

imhoguy · on Sept 18, 2020

Where they are not code? I think you meant that Ruby class is defined at runtime with sequential imperative or functional code. Ha! You can build class from Ruby code in a string too. A lot of choice.

ChiefOBrien · on Sept 17, 2020

Reminds me of OSGi. Great idea, poor adoption, mostly due to the complexity of the problem domain. Microservices are a lot worse in that regard, yet remain a whole lot more popular, sadly.

jackbravo · on Sept 18, 2020

That's what tools like https://github.com/nrwl/nx try to facilitate. Mainly facilitating separating the code in boundaries like backend, frontend, and common libraries.

I haven't really used nx myself :-p. But I would love to use something similar for other frameworks or languages.

JamesSwift · on Sept 17, 2020

And I will call that parallel universe: monorepo-verse

etaioinshrdlu · on Sept 17, 2020

This is basically what I do and it's great :)

jakobmartz3 · on Sept 17, 2020

interesting thought!

straws · on Sept 17, 2020

A number of years ago, I worked on a team (~20 engineers in total) that successfully carved off two relatively independent portions of a large Rails app using engines. I'm happy to see that Shopify is also using that strategy.

I'm curious to know more what sorts of challenges they have around managing dependencies across engines — I think what we were doing was fairly vanilla Rails, and we didn't have the opportunity to run into those sorts of issues.

exterm · on Sept 17, 2020

The answer to that question could probably fill another blog post :D

Long story short, Rails and dependency inversion equals lots of friction. The whole framework is built on the assumption that it's OK to access everything from everywhere, and over the years we've built lots of tooling on top of those assumptions.

E.g. we heavily use https://github.com/Shopify/identity_cache with active record associations that cross component boundaries.

We also have a GraphQL implementation that is pretty closely coupled to the active record layer and _really_ wants to reach into all the components directly.

All of those problems can be overcome, but this is definitely an area where we have to working against "established" Rails culture, and our own assumptions from the past.

straws · on Sept 17, 2020

I hope to hear more in the future!

Do you envision any extension points to the way engines are implemented that could better enforce boundaries? In our engines, there was nothing that referenced another engine's resources, leaving the main application to handle route mapping and ActiveRecord associations between app models and engine's models.

I feel like the use-case for engines has long been around supporting framework like functionality (Devise, Spree, etc), but I wonder if there are changes to be made that better support modularization for large apps.

exterm · on Sept 17, 2020

> extension points to the way engines are implemented that could better enforce boundaries

Can you expand on that? I'm not sure I follow.

sandGorgon · on Sept 17, 2020

What's the difference between "componentization+engines" and microservices?

From a deployment perspective are your engines deployed and scaled independently?

exterm · on Sept 17, 2020

components are

- same database - same runtime - same deployment - same repository

That said, I don't think this is an either/or. It's a spectrum. you can have components within the same runtime and repository that have separate databases, or components that are using the same database but live in separate repos, etc.

From one monolithic app towards fully separated microservices is a spectrum, and I think developers should be enabled to move freely around that spectrum.

sandGorgon · on Sept 17, 2020

I think components are the better option. Because it allows for separation of concerns without introducing deployment ...or worse : political complexity.

I call them Micro-SDKs.

JohnBooty · on Sept 17, 2020

I worked on a large Rails monolith a few years back with a similarly-sized team and we took the "components+engines" approach too.... and it was a bit of a nightmare, honestly. It sort of felt like the worst of both worlds, relative to monoliths or microservices.

I strongly suspect, but cannot prove, that we would have been better off simply transitioning to "macroservices" -- breaking the monolith up into several (as opposed to dozens) of reasonably sized pieces.

• We were encouraged to componitize everything. When I left, we were up to a few dozen components, and the number was climbing rapidly. I'm not sure if the approach itself was the problem, or if the flux during the transition period was the real pain point.

• We had no real enforcement of interfaces between components. It was so easy to break things in other peoples' components.

• Theoretically that breakage would be caught by tests. But to catch that breakage, you needed to run the complete test suite (30-60 minutes) rather than simply testing your own component

• Essentially, it felt like we were suffering all the disadvantages of microservices, with the exception of coordinating deployments; from a devops perspective it was still just a single monolithic deployment

• We still had many of the problems associated with monoliths, such as slow deployments, long test suite times, and extremely high per-instance RAM usage

• Various small tooling and debugging issues related to using Rails but going too far "off the Rails"

I'm looking forward to digging into the linked article and learning how Shopify solved those issues. They seem to have quite a bit of engineering firepower at their disposal. Our management did not allow us to dedicate a lot of resources to internal engineering concerns like this.

(We essentially had one guy figuring it all out himself, and due to internal politics he was forbidden from considering a microservices or "macro services / multiple monoliths" approach. He was talented and did the best he could, considering)

exterm · on Sept 17, 2020

> When I left, we were up to a few dozen components, and the number was climbing rapidly.

I should have included this in the blog post: The number of components _needs_ to be kept small. Shopify's main monolith is 2.8 million lines of code in 37 components, and I'd actually like to get that number _down_.

I like to compare this to the main navigation that we present to our merchants. It's useful if it has 8 entries. It's not useful if it has 400.

In a way, components are the main navigation to our code base. A developer should be able to look at what's in our "components" folder and get a general impression of what the system's capabilities are.

JohnBooty · on Sept 18, 2020

That's an excellent (and hard-earned, I'm sure!) insight. Thank you.

    I like to compare this to the main navigation that we 
    present to our merchants. It's useful if it has 8 
    entries. It's not useful if it has 400.

Yeah, we essentially wound up with a "junk drawer" of components. I could see a lot of companies, like ours, making that mistake -- turning all the things into components.

As you said in the article, one of the benefits of components for you was that it truly forced you to think about a proper separation of concerns. In hindsight, that's an area where we really missed the mark for a variety of reasons, some methodology-related.

We practiced a rather strict version of Scrum. Management paid a lot of attention to our velocity from week to week: we needed to rack up those story points.

But, outside of the tiny team dedicated to the component effort, there were no story points to be had for supporting that effort. Therefore we were in fact incentivized not to support it. I remember one sprint where I did some refactoring work in order to achieve a better separation of concerns. It negatively affected our velocity for the week and that was noticed.

So, we were receiving a schizophrenic message from management. We were all to support the component effort.... but on our own time, apparently?

kogus · on Sept 17, 2020

This is tangental, but I want applaud the mindset that produces the phrase "the opportunity to run into those sorts of issues".

straws · on Sept 17, 2020

I meant it more as a testament to how far you can get with a Rails app before needing to consider using the power tools :^)

joelbluminator · on Sept 17, 2020

2.8 million lines , 100 billion business. Rails can scale.

mandelbrotwurst · on Sept 17, 2020

100 billion? That seems like a lot of businesses per capita!

shwoopdiwoop · on Sept 17, 2020

Fairly certain GP referred to the market cap, not the number of businesses on Shopify's platform.

khendron · on Sept 17, 2020

He might be referring to Shopify GMV (Gross Merchandise Volume —the value of commerce facilitated by the platform), which is probably approaching $100B per year.

csomar · on Sept 17, 2020

No shopify market cap is $100bn; which is much higher than I expected. So I looked up their revenue, which is $1.6bn and they have an income deficit. so...

mandelbrotwurst · on Sept 17, 2020

Ah, yeah I thought it said businesses plural.

tgarv · on Sept 17, 2020

I think "100 billion business" means that the business (Shopify) is valued at $100 billion. (I'm not sure if that's true, that's just how I interpreted it)

jtsiskin · on Sept 17, 2020

“business”, not “businesses”

whycombagator · on Sept 17, 2020

Not sure that LOC is a meaningful metric/measure of scale, unless you mean scale of codebase itself

Thaxll · on Sept 17, 2020

Well apparently not since all the work they have to do to make it scale, using Java from day 1 would have solved a lot of issues.

We'll see in a couple of years after breaking down all those app if they stick to rail to do everything.

joelbluminator · on Sept 17, 2020

Yes, you're right. If only they sprinkled some Beans over everything all their scale problems would just disappear. Thousands of happy developers would have all worked on one big happy Spring application with no problems whatsoever.

whycombagator · on Sept 17, 2020

> Well apparently not since all the work they have to do to make it scale, using Java from day 1 would have solved a lot of issues.

It is a lot of work, but scaling anything is. What issues, specifically, are you alluding to that Java would solve?

> We'll see in a couple of years after breaking down all those app if they stick to rail to do everything.

I'd bet they currently don't use Rails/Ruby for everything. It's pretty rare for large companies to use just 1 language/framework for all things.

Thaxll · on Sept 17, 2020

> It is a lot of work, but scaling anything is. What issues, specifically, are you alluding to that Java would solve?

Ruby is slow, Java is fast. When you have to modify the runtime of a language because its too slow: https://engineering.shopify.com/blogs/engineering/optimizing...

Reading all those blogs from Shopify show that they spend a lot of time fighting a slow language.

It reminds me of Facebook and their Hack stuff, it's pretty much the same in what Shopify is getting into, they have something slow and really big and not way to get out of it so they just poor money to make it fast even if it means only the syntax resemble the original language.

Some compagnies faced the same problem, quick quick release something to iterate fast ( Rail / Python) but then after when it gets too big you're in real troubles and stuck with it. Twitter, Youtube, Facebook all had that problem.

crispyporkbites · on Sept 17, 2020

Name a successful web company that hasn’t had that problem

nurettin · on Sept 17, 2020

Everyone knows that only the best enterprise programmers apply to java positions, and when they somehow manage to pry themselves through the screening, they will help you bicker undecisively for hours when adding even the smallest functionality because they love bringing everything to the table at once instead of even considering to produce value in order to prove themselves as a valuable and knowledgeable part of the team and not the work they do. So it's a win from the start. Especially if you have 400 people in your team with 54 well-documented gatherings under their belts.

nr2x · on Sept 18, 2020

I only code web apps in assembly.

octernion · on Sept 17, 2020

we are actually doing precisely the same thing at instacart (breaking our 1+ million lines of code monolith into discrete components, which we call "domains"), and typing the boundaries and as much of the internals of these domains as possible with sorbet types.

this has the benefit of ruby dynamicism (fast development within domains, you can use all the nice railsy tooling, activerecord, and all the libraries we've built over the years), with type safety at the boundaries (we've also put in timeouts, thread separation, and error handling at the boundaries).

the additional benefit for using sorbet is that it makes making typed RPC calls (over twirp or graphql) much easier as you can introspect the boundaries trivially.

really cool to see other companies evolving similarly given the same starting conditions!

exterm · on Sept 17, 2020

There are quite a few people talking about this kind of stuff on https://rubymod.slack.com. I can send invites, just DM me on twitter https://twitter.com/_exterm

geospeck · on Sept 17, 2020

> I can send invites, just DM me on twitter

Seems like DM is closed. Thanks for the great article!

exterm · on Sept 17, 2020

oops - sorry. Opened for now.

octernion · on Sept 17, 2020

just sent you a note, thank you!

dragosmocrii · on Sept 17, 2020

Slightly off topic, but does anyone know if this "component based" development is what umbrella applications are in Elixir?

exterm · on Sept 17, 2020

It's certainly related. In very general terms, I would say splitting a Rails app into multiple engines is the same pattern as umbrella applications.

However, there are more interesting specifics here about things like all engines sharing a database, but having exclusive ownership of tables, as well as splitting HTTP routing over multiple engines etc.

Arubis · on Sept 17, 2020

I think you'll also find a lot of conceptual overlap with Phoenix Contexts; they'll generally all start as part of the same monolith/app but are sufficiently discrete that you can separate them out more easily than the Rails situation in TFA.

ravenstine · on Sept 17, 2020

Am I the only one who has a distaste for this phrase "component based development"? It just seems like a fancy way of saying object oriented programming without an overarching design pattern.

aidos · on Sept 17, 2020

Sounds like the “components” described above are much larger than classes.

octernion · on Sept 17, 2020

that's correct, at least for us a domain encapsulates many response types and dozens of different APIs that wrap various datastores, business logic, etc.

octernion · on Sept 17, 2020

we've actually taken the pattern of making the classes relatively stateless, and explicitly passing around typed state through these explicit apis. it's not really the same design pattern and imo conceptually different.

IshKebab · on Sept 17, 2020

My god I can't imagine a million lines of untyped code. Must be hell. Presumably you spend all day writing tests?

octernion · on Sept 17, 2020

hah, it's not hell but it's not entirely pleasant either. a _lot_ of that is tests, which is essentially how contracts and safety is enforced in ruby (at least prior to types).

sandGorgon · on Sept 17, 2020

This is a brilliant brilliant article.

Does anyone know how Shopify created it's Architecture Guild and grew it ? The author talks about "should have done it earlier"

exterm · on Sept 17, 2020

As the author, I would know :)

Thank you for the praise.

Ours kind of organically grew over time, but as I've been keeping it alive for the last few years I have a pretty good idea of how I would start it fresh.

You probably have some people in the company who either know much more about architecture than others, or are working on projects that are more interesting in terms of architecture. Find one of them, convince them to give a 15 min talk.

Announce the talk widely within the company, tell people to come to the new "architecture guild" slack channel you created to get the details / invites.

Schedule an hour to give plenty of time for discussions after the talk.

Repeat biweekly.

sandGorgon · on Sept 17, 2020

Thanks for replying.

How would you do it in a remote-first world? A zoom talk ?

How does this go beyond that one talk - would you incorporate aspects of this into official rewards/recognition ?

Or is gratification good enough. Getting a zoom audience is gonna be hard.

exterm · on Sept 18, 2020

Shopify has been a fully remote company for a few months now. https://financialpost.com/technology/shopify-is-joining-twit...

We're not using zoom, but google meet - but yes, these happen completely online now.

I find that people that are doing interesting stuff often _want_ to talk about it. However, a big part of Shopify culture is "do things, tell people" - it is definitely encouraged to spend time spreading context.

It's not directly part of any rewards framework, but one metric that goes into promotions is the area of impact. By giving a talk to the guild, you can have impact on a group that's larger than your team, potentially the whole organization. It counts.

But another reward is the positive feedback, interesting discussions and new connections that you make through this.

treis · on Sept 17, 2020

Have y'all seen any issues around autoloading of classes/modules in development? I've been working on a rails app composed of a handful of engines and I've noticed that every so often classes aren't loaded. 6 seems to be a lot better about it than 5 was.

mhoad · on Sept 17, 2020

Rails 6 has a totally new code loader that was built specifically to address those issues called Zeitwerk. Some details here if you're interested https://blog.bigbinary.com/2019/10/08/rails-6-introduces-new...

gregkerzhner · on Sept 17, 2020

Interesting article. We use a similar approach for our mobile apps to allow multiple teams to develop their own modules independently.

Can anyone speak to what the advantages and disadvantages to such an approach are as opposed to going full Kubernetes / Microservices? Is it that deploys are riskier and you can't scale separate pieces independently?

kawsper · on Sept 17, 2020

Does anyone know if the Storefront rendering described here[0] is running Rails or something else?

[0] https://engineering.shopify.com/blogs/engineering/how-shopif...

rafaelfranca · on Sept 17, 2020

The application is a Rack application reusing some of the components of Rails, but it is not a conventional Rails application given it doesn't need most of the framework.

banq · on Sept 18, 2020

DDD aggrgates: loose coupling with high cohesion!

ryanmarsh · on Sept 17, 2020

There’s so much truth in this. It’s full of lessons I tell clients at the outset of similar endeavors yet they often do not heed until they experience the pain first hand.

banq · on Sept 18, 2020

in Shopify, they actually applied DDD bounded context and aggregate ,but they maybe don't know DDD!

meesterdude · on Sept 17, 2020

Interesting read. I've seen a component based rails architecture work wonders for cleaning up a codebase and allowing for the benefits of a SOA encapsulation while still keeping everything under a monolithic architecture (and avoiding the networking nasties). Not such a fan of sorbet though, but hopefully something better comes along.

ToJans · on Sept 18, 2020

I can imagine that this has been a huge effort, and kudos to the team, but this is a solved problem; there are ample methodologies to resolve the big ball of mud.

IMHO the shopify team could have saved a lot of time by getting some schooling about strategic DDD, and consulting one or more DDD experts to draw a first version of their context map.

yeswecatan · on Sept 18, 2020

Do you know any DDD experts that offer consulting services?

mochii · on Sept 17, 2020

Very interesting read! Thank you for sharing.

throwaway691999 · on Sept 17, 2020

I think it's kind of bad that we have this trend to use "walls" to enforce modularity. This whole thing about using "walls" to enforce "developer behavior" is, in my humble opinion, the wrong direction.

If you think about it, almost all lack of modularity comes from shared mutable variables. Segregate mutability away from the core logic of your system and the smallest function in your architecture will become as modular as a microservice.

Really, any function that is stateless can be moved anywhere at anytime and used anywhere without fear of it being creating a permanent foothold in the architectural complexity of the system. So if the code is getting to structured where you become afraid of moving things... do this rather than build classes and walls around all your subroutines.

Remember as long as that add function doesn't mutate shared state you know it has zero impact on any part of the system other than it's output... you can replace it or copy it or use it anywhere.... this is really all you need to do to improve modularity of your system.

>Again and again we pondered: How should components call each other?

I think this is what's tripping most people up. They think DI IOC and OOP patterns are how you improve modularity. It's not. Immutable functions are what improves modularity of your program. The more immutable functions you have and the smaller they are the more modular your program will be. Segregate IO and mutations into tiny auxiliary functions away from your core logic which is composed of pure immutable functions.

>Circular dependencies are situations where for example component A depends on component B but component B also depends on component A.

I've never seen circular dependencies happen with pure functions. It's rare in practice. I think it occurs with objects because when you want one method of an object you have to instantiate that object which has a bunch of other methods and in turn dependencies that could be circular to the current object you're trying to call it from. In essence this kind of thing tends to happen because when you call a method you're actually calling a group of methods and state within a class and upon all those dependencies as well increasing the chances of a circular dependency.

Still I've seen this issue occur with namespacing when you import files. Walls aren't going to segregate this from happening. You need to structure your dependencies as a tree.

lmm · on Sept 18, 2020

> Really, any function that is stateless can be moved anywhere at anytime and used anywhere without fear of it being creating a permanent foothold in the architectural complexity of the system.

That's not really true. A pure function can still be coupled to a particular internal data representation. It can still assume particular invariants that you may not want to maintain. Namespacing functions together with the data structures they operate on is still a good idea, and helps with keeping a coherent model at each level - e.g. if your business logic is calling a function that's about the specific mechanics of encoding data for Redis, you're probably using the wrong abstraction.

Pushing mutability to the edges is good and useful but it's not the be-all and end-all of decoupling. Enforced walls are a much better idea than spending your discipline budget on maintaining decoupling by hand. A lot of the time a pure function can actually be decoupled completely from the datatypes it's operating on by using parametricity (and maybe a standard typeclass that the datatype it operates on conforms to), but you may not notice that unless you've got some module boundaries that nudge you to think about that kind of thing.

leafboi · on Sept 17, 2020

I think it's kind of bad that we have this trend to use hardware to enforce modularity. If it's a performance issue, sure break it up into more hardware. If it's just code modularity than by shifting to microservices you are adding additional complexity of maintaining multiple services on top of modularizing the system. In short it's overkill. This whole thing about using hardware to enforce "developer behavior" is stupid. You can use software to enforce developer behavior. Your operating system, your programming language is already "enforcing" developer behavior.

Additionally, your microservices are hard lines of modularization. It is very hard to change a module once it's been materialized because it's hardware.

If you think about it, almost all lack of modularity comes from shared mutable variables. Segregate mutability away from the core logic of your system and the smallest function in your architecture will become as modular as a microservice.

Really, any function that is stateless can be moved anywhere at anytime and used anywhere without fear of it being creating a permanent foothold in the architectural complexity of the system. So if the code is getting to structured where you become afraid of moving things... do this rather than go to microservices.

>We can more easily onboard new developers to just the parts immediately relevant to them, instead of the whole monolith.

Correct me if I'm wrong but don't folders and files and repos do this? Does this make sense to you that it has to be broken down into hardware?

>Instead of running the test suite on the whole application, we can run it on the smaller subset of components affected by a change, making the test suite faster and more stable.

Right because software could never do this in the first place. In order to test a quarter of my program in an isolated environment I have to move that quarter of my program onto a whole new computer. Makes sense.

>Instead of worrying about the impact on parts of the system we know less well, we can change a component freely as long as we’re keeping its existing contracts intact, cutting down on feature implementation time.

Makes sense because software contracts only exist as http json/graphql/grpc apis. The below code isn't a software contract it's only how old people do things:

   int add(x: int, y: int)

Remember as long as that add function doesn't mutate shared state you know it has zero impact on any part of the system other than it's output... you can replace it or copy it or use it anywhere.... this is really all you need to do to improve modularity of your system.

Editing it on the other hand could have some issues. There are other ways to deal with this and simply copying the function, renaming and editing it is still a good solution. But for some reason people think the only way to deal with these problems is to put an entire computer around it as a wall. So whenever I need some utility function that's located on another system I have to basically copy it over (along with a million other dependencies) onto my system and rename it... wait a minute can't I do that anyway (without copying dependencies) if it was located in the same system?

>Again and again we pondered: How should components call each other?

I think this is what's tripping most people up. They think DI IOC and OOP patterns are how you improve modularity. It's not. Immutable functions are what improves modularity of your program. The more immutable functions you have and the smaller they are the more modular your program will be. Segregate IO and mutations into tiny auxiliary functions away from your core logic which is composed of pure immutable functions. That's really the only pattern you need to follow and some languages can enforce this pattern without the need of "hardware."

>Circular dependencies are situations where for example component A depends on component B but component B also depends on component A.

I've never seen circular dependencies happen with pure functions. It's rare in practice. I think it occurs with objects because when you want one method of an object you have to instantiate that object which has a bunch of other methods and in turn dependencies that could be circular to the current object you're trying to call it from. In essence this kind of thing tends to happen with exclusively with objects. Don't group one function with the instantiation of other functions and you'll be fine.

Still I've seen this issue occur with namespacing when you import files. Hardware isn't going to segregate this from happening. You need to structure your dependencies as a tree.

look_lookatme · on Sept 17, 2020

>>We can more easily onboard new developers to just the parts immediately relevant to them, instead of the whole monolith.

>Correct me if I'm wrong but don't folders and files and repos do this? Does this make sense to you that it has to be broken down into hardware?

This entire post is literally about using folders (directories) and files to enforce boundaries...

leafboi · on Sept 17, 2020

they use the term breaking down a monolith and "architecture" so from that you can derive that it's literally about using an entire VM or computer to enforce boundaries.

Folders and files are used in "monoliths" anyway. Nothing new to talk about that here. Are you implying that their monolith is just one big file and they're beginning the process of breaking that thing down into multiple files and different folders?

I don't know about you but that doesn't make any sense to me.

exterm · on Sept 17, 2020

Hey Leafboi - I recommend reading the first post in the series for some background https://engineering.shopify.com/blogs/engineering/deconstruc...

We don't use "hardware" or "VMs" to facilitate modularity.

leafboi · on Sept 17, 2020

All right. I'm wrong. Didn't know this. Thanks for linking. Still can't exactly fault me on that. It's not easy to find the contextual blog post if this post doesn't easily say it's part of a series.

Still though, my expose is still relevant, those are some hard lines that can easily be gotten rid of if your functions were immutable and not part of a class.

Any internal private function is safe to use anywhere in the system as long as it's not attached to a class and it doesn't modify shared state. If your systems were modelled this way there would be no need to really think about modularization as your subroutines are already modular.

For example:

  class A:
     def constructor:
         //does a bunch of random shit

     def someMethodThatMutatesSomething() -> output




   class B:

       def someOtherFunctionThatNeedsClassA:
           //cannot call someMethodThatMutatesSomethingwithout doing "a bunch of random shit" or even possibly modifying or breaking something else. Modularity is harder to achieve with this pattern.

versus:

   def somePureFunctionWithNoSideEffects(input) -> output

somePureFunctionWithNoSideEffectsabove does not need any hard lines of protection. There is zero need to use the antics of "deconstructing a monolith" if you structured things this way. Functions like this can be exposed publicly for use by anyone with literally zero issues.

Shared muteable state and side effects is really the key thing that breaks modularity. Everyone misses it and comes up with strange ways to improve modularity by using "walls" everywhere. It's like cutting my car in half from left to right with a wall and calling it "modularization." When you find out that the engine in front actually needs the gas tank in back then you'll realize that the wall only produces more problems.

richardlblair · on Sept 17, 2020

I think what's really unfortunate here is you started pretty pointed in what you were saying, and you've stayed pointed. It reads as confrontational.

It's unfortunate because you make a good point. Pure functions do not get the attention they deserve. However, no one will read that because you just sound like you're attacking for no real reason.

I'm only saying this because if you're this way here there is a solid chance you're like that in other areas of your life. What you have to say is important, but if you approach your conversations this way people won't listen.

Why did I take the time to write this? Because sometimes those closest to us won't give us the feedback we need.

leafboi · on Sept 17, 2020

Thanks. But this is the internet. I use a bit of aggression experimentally at times. Overall though, it sounds confrontational but I'm actually pretty factual and I never attacked anyone personally, it's all about the topic and idea. I actually admit when I'm wrong (see above, and who does that in life and on the internet?).

What's going on is I'm spending zero energy in attempting to massage the explanation with fake attempts to be nice. I'm just telling it like it is. Very few opportunities to do this in real life except on the internet.

In the company I work for do I spend time to tell my coworkers that pure functions are the key to modularity when classes and design patterns are ingrained in the culture? Do I tell them that their entire effort to move to microservices is motivated by hype and is really a horizontal objective with no actual benefit? No. I don't. People tend to dismiss things they don't agree with unless it's aggressively shoved in their face. They especially don't agree with ideas that go against the philosophies and and practices and they've been following for years and years.

Thus if I'm nice about it, I'm ignored, if I'm vocal and aggressive about it, I'm heard but it will also hurt my reputation. It's HN feel free to experiment just don't try it at work.

Yeah my attitude isn't the best, but honestly, if I was nice about it, less people would read this or think about it. By doing this on the internet I can raise a point while not ruining my rep. (And I'm not actually aggressive as there are no personal attacks unless someone said something personal about me)

Tell me, in your opinion, how would you get such a point across in a culture where the opposite is pretty ingrained? I'm down to try this, I can repost my original post with the errors corrected and a nicer tone to see the response.

richardlblair · on Sept 17, 2020

I appreciate the point you're trying to make, but the truth is that you can make factual arguments without being so aggressive. Whether the aggression is targeted at a person doesn't really matter. It's unnecessary, disrespectful, and just feeds into the general toxicity that plagues our culture.

> Thus if I'm nice about it, I'm ignored, if I'm vocal and aggressive about it, I'm heard but it will also hurt my reputation.

I think the fact we are talking about your tone and not your points about functional programming speaks to this by itself. You weren't heard. You were felt, though.

> I'm not actually aggressive as there are no personal attacks

Aggression without a target is still aggression. If I aggressively take the recycling out, that aggression is still experienced by people around me. Probably my partner, who will inevitable have a little talk to me about it, lol.

> Tell me, in your opinion, how would you get such a point across in a culture where the opposite is pretty ingrained?

Engage in an intellectual conversion based off mutual respect. You will never change someones mind on the spot, intellectual people will often mull things over for a while. In the process you may learn a few things yourself. I've worked in places that excelled at this, where respectful discourse was promoted. Conversations revolved around facts, but respect was maintained.

Sidebar: Shopify doesn't really have microservices. They have a few services, but they are entire services which serve an entire business unit. They are the exception. When I worked there I worked on one such service. What I'd tell people is if you couldn't start a whole new company with the service you were building, don't build it as a service.

leafboi · on Sept 17, 2020

I think you missed my point. I'm saying when you aren't aggressive people tend not to want to intellectually engage with you. People are emotional creatures and what doesn't excite them emotionally they don't engage. I'm saying I used the aggression on purpose for my own ends, but I caveated by saying that no actual attack occurred.

I think you need to think deeper than the traditional "mutual respect" attitude and generally being nice. Not all great leaders acted this way either. It's very nuanced and complicated how to get people to change or listen. The internet is an opportunity to try things out rather then take the safe uncomplicated "nice" way that we usually try in the workplace.

>Engage in an intellectual conversion based off mutual respect. You will never change someones mind on the spot, intellectual people will often mull things over for a while. In the process you may learn a few things yourself. I've worked in places that excelled at this, where respectful discourse was promoted. Conversations revolved around facts, but respect was maintained.

Right except this is exceedingly rare. Most people do not act this way. Respect was maintained but the point is instantly forgotten and dismissed. Likely the respect covers up actual misunderstanding or disagreement. I find actual intense arguments open people up to say what they mean rather than cover up everything in gift wrapping.

Think about this way. The reason why Trump won the election is not because he was nice. The complexities of human relationships goes deeper then just "mutual respect" There are other ways to make things move. The internet is often an opportunity for you to try the alternative methods without much risk.

>I think the fact we are talking about your tone and not your points about functional programming speaks to this by itself. You weren't heard. You were felt, though.

The world moves through feelings. Not for all cases but oftentimes to get heard you need to get "felt" first.

webmaven · on Sept 18, 2020

> >I think the fact we are talking about your tone and not your points about functional programming speaks to this by itself. You weren't heard. You were felt, though.

> The world moves through feelings. Not for all cases but oftentimes to get heard you need to get "felt" first.

This is true, but you have options in terms of what feeling you're aiming for.

There is a world of difference in the response you're likely to get from "When Z you should do X because Y" vs. "We had a Z problem, it turns out that Y was the issue, so we did X."

The former will probably get you an "uh-oh" and the latter an "a-ha" or "hmm". Big difference.

modal-soul · on Sept 17, 2020

Just because a function is pure doesn't mean there is zero-risk in exposing it publicly. You're conflating complexity in managing state with complexity in managing domain boundaries.

A tangled web of function calls can be very confusing to work with, regardless of purity.

leafboi · on Sept 17, 2020

From a purely structural standpoint there is no risk. But you are talking about something different. You use the word "confusion."

Confusion is an organizational issue that can be handled with social solutions like names, namespaces and things like that. You can compose functions to form higher order functions with proper naming to make sense of things. So for example if you have 30 primitive functions you can compose smaller components into 10 bigger functions in a higher layer and expose that as an api. This is more of a semantical thing as you can still use the lower level primitives as a library and chain those lower level functions to achieve the same goal as using the higher level api, the higher level functions just make it easier to reason about the complexity.

Confusion, Semantics and organization is in a sense a social issue that is solved by social solutions like proper naming, grouping and composing. I'm not dismissing these issues (they are important) but I'm saying they are in a different category.

Overall though the problem I am addressing is structural. There are real structural issues that occur if your functions are not pure. When 4 methods operate on shared state in a class all four methods become glued together. You cannot decompose or recompose these functions ever. They cannot be reused without instantiating all the baggage that comes with the class.

mperham · on Sept 17, 2020

I don't think you need to mansplain architecture to the blog post author.

leafboi · on Sept 17, 2020

You can't talk about modularity without touching on shared mutable state. Shared mutable state is the fundamental primitive that eliminates modularity. You get rid of this, you're entire program is now modular.

None of the writing really gets deep into this so I assume the author doesn't know.

It's not "mansplaining" you social justice warrior. I don't even know the sex of the author and I don't care. Don't turn this into some sex based conflict. It's called explaining, and that's all it is.

I'm assuming you don't know about it either so I suggest you read my "explanation" as well.

bori · on Sept 17, 2020

I like that they completely dodged the term "microservice" in the whole post.

exterm · on Sept 17, 2020

you should read the first post in the series if you want to read about microservices. https://engineering.shopify.com/blogs/engineering/deconstruc...