Services, or even microservices, are more of a strategy to allow teams to scale ...

camgunz · on Feb 20, 2024

> Services, or even microservices, are more of a strategy to allow teams to scale than services or products to scale.

I've never really understood why you couldn't just break up your monolith into modules. So like if there's a "payments" section, why isn't that API stabilized? I think all the potential pitfalls (coupling, no commitment to compatibility) are there for monoliths and microservices, the difference is in the processes.

For example, microservices export some kind of API over REST/GraphQL/gRPC which they can have SDKs for, they can version them, etc. Why can't you just define interfaces to modules within your monolith? You can generate API docs, you can version interfaces, you can make completely new versions, etc.

I just feel like this would be a huge improvement:

- It's so much more engineering work to build the service handler scaffolding (validation, serialization/deserialization, defining errors)

- You avoid the runtime overhead of serialiation/deserialization and network latency

- You don't need to build SDKs/generate protobufs/generate clients/etc.

- You never have the problem of "is anyone using this service?" because you can use code coverage tools

- Deployment is much, much simpler

- You never have the problem of "we have to support this old--sometimes broken--functionality because this old service we can't modify depends on it". This is a really undersold point: maybe it's true that microservice architectures let engineers build things without regard for other teams, but they can't remove things without regard for other teams, and this dynamic is like a no limit credit card for tech debt. Do you keep that service around as it slowly accretes more and more code it can't delete? Do you fork a new service w/o the legacy code and watch your fleet of microservices grow ever larger?

- You never have the problem of "how do we update the version of Node on 50 microservices?"

lll-o-lll · on Feb 20, 2024

> I've never really understood why you couldn't just break up your monolith into modules

You can! We used to do this! Some of us still do this!

It is, however, much more difficult. Not difficult technically, but difficult because it requires discipline. The organisations I’ve worked at that have achieved this always had some form of dictator who could enforce the separation.

Look at the work done by John Lakos (and various books), to see how well this can work. Bloomberg did it, so can you!

Creating a network partition makes your system a distributed system. There are times you need this, but the tradeoff is at least an order of magnitude increase in complexity. These days we have a lot of tooling to help manage this complexity, but it’s still there. The combination of possible failure states is exponential.

Having said all this, the micro service architecture does have the advantage of being an easy way to enforce modularity and does not require the strict discipline required in a monolith. For some companies, this might be the better tradeoff.

default-kramer · on Feb 21, 2024

> easy way to enforce modularity and does not require the strict discipline required in a monolith

In my experience, microservices require more discipline than monoliths. If you do a microservice architecture without discipline you end up with the "distributed monolith" pattern and now you have the worst of both worlds.

Fanmade · on Feb 21, 2024

Yes, I completely agree. If your team doesn't have the skills to use a proper architecture within a monolith, letting them loose on a distributed system will make things a lot worse. I've seen that happen multiple times.

gmfawcett · on Feb 21, 2024

> does not require the strict discipline required in a monolith

How so? If your microservices are in a monorepo, one dev can spread joy and disaster across the whole ecosystem. On the other hand, if your monolith is broken into libraries, each one in its own repo, a developer can only influence their part of the larger solution. Arguably, system modularity has little to do with the architecture, and much to do with access controls on the repositories and pipelines.

lll-o-lll · on Feb 21, 2024

> Arguably, system modularity has little to do with the architecture, and much to do with access controls on the repositories and pipelines.

Monoliths tend to be in large monolithic repos. Microservices tend to get their own repo. Microservices force an API layer (defined module interface) due to imposing a network boundary. Library boundaries do not, and can generally be subverted.

I agree that modularity has nothing to do with the architecture, intrinsically, simply that people are pushed towards modularity when using microservices.

camgunz · on Feb 21, 2024

People make this argument as though it's super easy to access stuff marked "private" in a code base--maybe this is kind of true in Python but it really isn't in JVM languages or Go--and as though it's impossible to write tightly coupled microservices. The problem generally isn't reaching into internal workings or coupling, the problem is that fixing it requires you to consider the dozens of microservices that depend on your old interface that you have no authority to update or facility to even discover. In a monolith you run a coverage tool. In microservices you hope you're doing your trace IDs/logging right, that all services using your service used it in the window you were checking, and you start having a bunch of meetings with the teams that control those services to coordinate the update. That's not what I think of when I think of modularity, and in practice what happens is your team forks a new version and hopes the old one eventually dies or that no one cares how many legacy microservices are running.

Izkata · on Feb 21, 2024

> It is, however, much more difficult. Not difficult technically, but difficult because it requires discipline.

Before that, people need to know it's even an option.

Years ago when I showed a dev who had just switched teams how to do this with a feature they were partway through implementing (their original version had it threading through the rest of the codebase) it was like one of those "mind blown" images. He had never even considered this as a possibility before.

dalyons · on Feb 20, 2024

i agree that its possible. From what i've seen its probably harder though than just doing services. You are fighting against human nature, organizational incentives, etc. As soon as the discipline of the developers, or vigilance of the dictator lapses, it degenerates.

Fanmade · on Feb 21, 2024

It is really hard to read this for me. How can anyone think that it is harder to write a proper monolith than implementing a distributed architecture?

If you just follow the SOLID principles, you're already 90% there. If your team doesn't have the knowledge (it's not just "discipline", because every proper developer should know that they will make it harder for everyone including themselves if they don't follow proper architecture) to write structured code, letting them loose on a distributed system will make things much, much worse.

dalyons · on Feb 21, 2024

its not really a technical problem. As others have mentioned on various threads, its a people coordination problem. Its hard to socially/organizationally coordinate the efforts of 100s of engineers to a single thing. It just is. If they're split into smaller chunks and put behind relatively stable interfaces, those people can work on their own on their own thing, roughly however they want. That was a major reason behind the original bezos service mandate email. You can argue that results in a harder overall technical solution (distributed is harder than monolith) but it is inarguably to me much easier organizationally.

You can sort of get there if you have a strong central team working on monolith tooling that enforces module seperation, lints illegal coupling, manages sophisticated multi deployments per use, allows team based resource allocation and tracking, has per-module performance regression prevention, etc. They end up having many of the organizational problems of a central DBA team, but its possible. Even then, I am not aware of many(any?) monoliths in this situation that have scaled beyond 500+ engineers that people are actually happy with the situation they've ended up in.

Cthulhu_ · on Feb 20, 2024

> some form of dictator who could enforce the separation.

Like a lead developer or architect? Gasp!

I wonder if the microservices fad is so that there can be many captains on a ship. Of course, then you need some form of dictator to oversee the higher level architecture and inter-service whatnots... like an admiral.

mjr00 · on Feb 20, 2024

> You never have the problem of "how do we update the version of Node on 50 microservices?"

And instead you have the problem of "how do we update the version of Node on our 10 million LOC codebase?" Which is, in my experience, an order of magnitude harder.

Ease of upgrading the underlying platform versions of Node, Python, Java, etc is one of the biggest benefits of smaller, independent services.

camgunz · on Feb 20, 2024

> And instead you have the problem of "how do we update the version of Node on our 10 million LOC codebase?"

I think if you get to that scale everything is pretty hard. You'll have a hard time convincing me that it's any easier/harder to upgrade Node on 80 125K LOC microservices than a 10M LOC monolith. Both of those things feel like a big bag of barf.

naasking · on Feb 20, 2024

Upgrading the platform also happens at least 10x less frequently, so that math doesn't necessarily work out in your favour though.

mjr00 · on Feb 20, 2024

It's much easier to make smaller scope changes at higher frequency than it is to make large changes at lower frequency. This is the entire reason the software industry adopted CI/CD

naasking · on Feb 20, 2024

I'm not sure that's measuring what you think. The CI pipeline is an incentive for a good test suite, and with a good test suite the frequency and scope of changes matters a lot less.

CI/CD is also an incentive to keep domain-level scope changes small (scope creep tends to be a problem in software development) in order to minimize disruptions to the pipeline.

These are all somewhat different problems than upgrading the platform you're running, which the test suite itself should cover.

groby_b · on Feb 21, 2024

CI/CD is usually a component of DevOps, and any decent DevOps team will have DORA metrics. Time-to-fix, frequency of deploys are both core metrics, and mirror frequency and scope of changes. You want change often, and small.

Yes, change failure rate is also measured, and that's why good test suites matter, but if you think frequency and scope of change don't matter for successful projects, you haven't looked at the data.

That means frequently updating your dependencies against a small code base is much more useful (and painless) than occasional boil-the-ocean updates.

(As always, excepting small-ish teams, because direct communication paths to everybody on the team can mitigate a lot of problems that are painful at scale)

Tainnor · on Feb 20, 2024

> I've never really understood why you couldn't just break up your monolith into modules.

I think part of it is that many just don't know how.

Web developers deal with HTTP and APIs all the time, they understand this. But I suspect that a lot of people don't really understand (or want to understand) build systems, compilers, etc. deeply. "I just want to press the green button so that it runs".

Cthulhu_ · on Feb 20, 2024

Counterpoint, most monoliths are built like that; I wonder if they think that pressing a green button is too easy, like, it HAS to be more complicated, we HAVE to be missing something.

dns_snek · on Feb 20, 2024

How do you square that with the fact that shit usually hits the fan precisely because of this complexity, not in spite of it? That's my observation & experience, anyway.

Added bits of "resiliency" often add brand new, unexplored failure points that are just ticking time bombs waiting to bring the entire system down.

bluGill · on Feb 20, 2024

Not adding that resiliency isn't the answer though - it just means known failures will get you. Is that better than the unknown failures because of your mitigation? I cannot answer that.

I can tell you 100% that eventually a disk will fail. I can tell you 100% that eventually the power will go out. I can tell you 100% that even if you have a computer with redundant power supplies each connected to separate grids, eventually both power supplies will fail at the same time - it just will happen a lot less often than if you have a regular computer not on any redundant/backup power. I can tell you that network cables do break from time to time. I can tell you that buildings are vulnerable to earthquakes, fires, floods, tornadoes and other such disasters). I can tell you that software is not perfect and eventually crashes. I can tell you that upgrades are hard if any protocol changed. I can tell you there is a long list of other known disasters that I didn't list, but a little research will discover.

I could look up the odds of the above. In turn this allows calculating the costs of each mitigation against the likely cost of not mitigating it - but this is only statistical you may decide something statistically cannot happen and it does anyway.

What I cannot tell you is how much you should mitigate. There is a cost to each mitigation that need to be compared to the value.

camgunz · on Feb 20, 2024

Absolutely yeah, these things are hard enough to test in a controlled environment with a single app (e.g. FoundationDB) but practically impossible to test fully in a microservices architecture. It's so nice to have this complexity managed for you in the storage layer.

Eridrus · on Feb 20, 2024

Microservices almost always increase the amount of partial failures, but if used properly can reduce the amount of critical failures.

You can certainly misapply the architecture, but you can also apply it well. It's unsurprising that most people make bad choices in a difficult domain.

Tainnor · on Feb 20, 2024

Fault tolerance doesn't necessarily require microservices (as in separate code bases) though, see Erlang. Or even something like Unison.

But for some reason it seems that few people are working on making our programming languages and frameworks fault tolerant.

Eridrus · on Feb 20, 2024

Because path dependence is real so we're mostly building on top of a tower of shit. And as computers got faster, it became more reasonable to have huge amounts of overhead. Same reason that docker exists at all.

troupe · on Feb 20, 2024

> How do you square that with the fact that shit usually hits the fan precisely because of this complexity

The theoretical benefit may not be what most teams are going to experience. Usually the fact that microservices are seen as a solution to a problem that could more easily be solved in other much simpler ways, is a pretty good indication that any theoretical benefits are going to be lost through other poor decision making.

danielovichdk · on Feb 20, 2024

Microservices is more about organisation than it is about technology.

And that is why developers have so much trouble getting it right. They can't without having the organisational fundamentals in place. It is simply not possible.

The architectural constraints of microservices will show the organisational weaknesses in a much higher rate because of the pressure it puts on having the organisation be very strict about ownership, communication and autonomy.

The takes a higher level of maturity as an organisation to enable the benefits of microservies, which is also why most organisations shouldn't even try.

Stop all the technical nonsense because it won't solve the root cause of the matter. It's the organisation. Not the technology

manicennui · on Feb 20, 2024

Except that most people build microservices in a way that ignores the reality of cloud providers and the fact that they are building (more) distributed systems, and often end up with lower resiliency.