Let's consign CAP to the cabinet of curiosities

kstrauser · 2024-07-25T15:28:59 1721921339

So you’re setting up a multi-region RDS. If region A goes down, do you continue to accept writes to region B?

A bank: No! If region A goes down, do not process updates in B until A is back up! We’d rather be down than wrong!

A web forum: Yes! We can reconcile later when A comes back up. Until then keep serving traffic!

CAP theorem doesn’t let you treat the cloud as a magic infinite availability box. You still have to design your system to pick the appropriate behavior when something breaks. No one without deep insight into your business needs can decide for you, either. You’re on the hook for choosing.

slaymaker1907 · 2024-07-25T16:27:36 1721924856

Actually, banks use eventually consistent systems in many cases. They'd rather let a few customers overdraw a bit than stop a customer from being able to buy something on their card when they have the funds for it.

nly · 2024-07-25T17:55:49 1721930149

It helps that most bank accounts come with overdraft agreements and the bank can come after you for repayment with the full force of credit law behind them

bguebert · 2024-07-25T20:32:10 1721939530

Yeah, banks try and encourage overdraws because they make a bunch of fees off them. I remember reading about some of them getting in trouble for reordering same day deposits in a way to maximize the overdraw fees.

https://cw33.com/news/heres-how-banks-rearrange-transactions...

wongarsu · 2024-07-25T16:48:27 1721926107

Good point. But I'd argue that's an artifact of history. Traditional banks started out when eventual consistency was the only choice. First with humans manually reconciling paper ledgers based on messages from couriers and telegraphs, later with the introduction of computers in batch processes that ran over night. The ability to do this live is new to them, and thus their processes are designed to not require it.

Financial institutions set up in this century (like paypal or fintecs) usually do want consistency at all cost. Your risk exposure is a lot lower if you know people don't spend or withdraw more than they have.

darby_nine · 2024-07-25T16:59:02 1721926742

> Good point. But I'd argue that's an artifact of history.

I think there's also less liability for the banks if they accept and build around the risk of both eventual consistency and fraud. Even now wiring is necessary to pull off a digital bank theft, and even then you need to be extremely fast to get away with the money, and extremely fast to not go to prison, and liquidate it extremely quickly (likely into bitcoin). Even small delays in the transactions would make this basically impossible with modern banking, let alone a full clearing house day.

pradn · 2024-07-25T18:22:54 1721931774

Eventual consistency gives you more availability, at the cost of consistency. It's fine for financial institutions to prefer this trade-off, because the any over/under error is a likely a fraction of the cost of unavailability.

Also, in the real world, we can solve problems at a different level - like legal or product.

PaulHoule · 2024-07-25T18:36:38 1721932598

Also the financial institutions are making enough profit that they can afford to lose a little bit in the process of reconciliation. Consider how the 3.5% fees credit cards charge mean they can afford to eat a bit of fraud and actually can tolerate more fraud than a system which had lower fees could.

cameronh90 · 2024-07-25T19:40:01 1721936401

Requiring availability sounds like a good recipe for global payment system outages. It also makes it a lot harder to buy things on a plane.

dalyons · 2024-07-25T18:12:41 1721931161

fintechs have some inhouse tech but typically involve all kinds of third party systems. Even if you are interacting with these third parties via apis(instead of csv or whatever), you will always need an eventually consistent recon mechanism to make sure both sides agree on what was done. It doesnt have to take days though i agree.

The only way around this would be to replace stateful apis with some kind of function that takes a reference to a verifiable ledger, that can mutate that and anyone can inspect themselves to see if the operation has been completed. Oh wait :)

kstrauser · 2024-07-25T16:46:48 1721926008

In many cases. There are few absolutes at the edges. They have to be decided on a case-by-case basis. Also, failure to decide is deciding.

hot_gril · 2024-07-25T17:27:32 1721928452

Very eventually consistent. Like, humans involved sometimes.

dilyevsky · 2024-07-25T16:57:16 1721926636

Ignoring CAP theorem doesn’t just mean eventually consistent - means you will lose transactions when replicas that aren’t fully replicated get corrupted. Probably not something you want to be caught doing if you’re a bank

skissane · 2024-07-25T22:53:52 1721948032

Real world financial systems do have transactions that aren’t fully replicated and hence can be lost as a result. But banks just decide that both the probability and sums involved are low enough that they’ll wear the risk.

Example: some ATMs are configured that if the bank computer goes down (or the connection to it), they’ll still let customers withdraw limited sums of money in “offline” mode, keeping a local record of the transaction, that will then be uploaded back to the central bank computers when the connection comes back up.

But, consider this hypothetical scenario: bank computer has a major outage. ATM goes into offline mode. Customer goes to ATM to withdraw $100. ATM grants withdrawal and records card number, sum and timestamp internally (potentially even in two different forms-on disk and a paper-based backup). Customer leaves building. 30 minutes later, bank computer is still down, when a terrorist attack destroys the building and the ATM inside. The customer got $100 from the bank, but all record of it has been destroyed. Yet, this is such an unlikely scenario, and the sums involved are small enough (in the big picture), that banks don’t do anything to mitigate against it - the mitigation costs more than the expected value of the risk

nijave · 2024-07-26T21:40:41 1722030041

I'd be surprised if they didn't have insurance that covered the sum of money in the machine

eschneider · 2024-07-25T20:06:18 1721937978

As patio11 says, "The optimum level of fraud is not zero."

nickbauman · 2024-07-26T17:15:08 1722014108

Having worked at a couple of banks, this is absolutely true.

eddieroger · 2024-07-25T19:02:58 1721934178

That and they love to charge overdraft fees to accounts that don't otherwise make them a ton of money for the privilege.

mbesto · 2024-07-25T17:43:37 1721929417

Actually, CAP theorem is only relevant to a database transaction, not a system. The system has to be available to even LET the card process. This is why no bank would ever use anything BUT an ACID based system as data integrity is far more important than speed (and availability for that matter).

phh · 2024-07-25T15:47:25 1721922445

Yes. I see CAP theorem as a (perfectly right) answer to managers demanding literally impossible requirements

cryptonector · 2024-07-25T16:58:10 1721926690

You are mischaracterizing what TFA says. It says that partitions either keep a client from reaching any servers (in which case we don't care) or they keep some servers from talking to others (or some clients from talking to all servers), and that in the latter case whenever you can have a quorum (really, a majority of the servers) reachable then you can continue.

You've completely elided the quorum thing to make it sound like TFA is nuts.

kstrauser · 2024-07-25T17:13:32 1721927612

Quorum doesn't let you ignore partitions. It can't. You still have to decide what happens to the nodes on the wrong side of the partition, or how to handle multiple partitions (you have 5 nodes. You have 2 netsplits. Now you have clusters of 2 nodes, 2 nodes, and 1 node. Quorum say what?).

I don't think TFA is nuts. I do think the premise that engineers using cloud can ignore CAP theorem is wrong, though. It's a decision you must consider when you're designing a system. As long as everything is running well, it doesn't matter. You've got to make sure that the behavior when something breaks is what you want it to be. And you can't outsource that decision. It can't be worked around.

jrumbut · 2024-07-25T17:31:25 1721928685

It's poor timing for this article.

The CrowdStrike incident has demonstrated that no matter what you do it is possible for any distributed system to be partitioned in such a way that you have to choose between consistency and availability and if the system is important then that choice is important.

imtringued · 2024-07-26T08:08:31 1721981311

You're ignoring that a system is temporarily partitioned for the duration of the time it takes to retrieve information from all nodes.

Partition intolerance means you can only proceed when all nodes have been reached. This means partition intolerance not only deals with availability in the yes or no sense, but also in terms of latency.

darby_nine · 2024-07-25T19:39:31 1721936371

That's true, but I think the catch is that most customers likely didn't view it as an investment that could decrease availability.

cryptonector · 2024-07-25T17:26:50 1721928410

> You still have to decide what happens to the nodes on the wrong side of the partition

You don't because their clients won't also be isolated from the other servers. That's TFA's point, that in a cloud environment network partitions that affect clients generally deny them access to all servers (something you can't do anything about), and network partitions between servers only isolate some servers and none of the clients.

kstrauser · 2024-07-25T17:43:41 1721929421

That’s fine if you value availability over consistency. Which is fine if that’s the appropriate choice for your use case! Even then you have to decide whether to accept updates in that situation, know that they may never be reconciled if the partition is permanent.

cryptonector · 2024-07-25T18:07:22 1721930842

No, you can have availability and consistency as long as you have enough replicas. Network partitions that leave you w/o a quorum are like the whole cloud being down. Network partitions don't leave some clients only able to reach non-quorums -- this is the big point of TFA.

dijit · 2024-07-25T19:12:46 1721934766

> No, you can have availability and consistency as long as you have enough replicas.

You've misunderstood the problem.

You cannot be consistent if you cannot resolve conflicts, taking writes after a network partition means you are operating on stale data.

Honestly the closest we've ever gotten to solving CAP theorum is spanner[0], which trades off availability by being hugely less performant for writes. I'm aware spanner is used a lot in google cloud (under the hood), but you won't solve CAP by having hundreds of PGSQL replicas, because those aren't using Spanner.

In fact, you can test it out, ElasticSearch orchestrates a huge number of lucene databases, a small install will have a dozen or so replicas and partitions, but you're welcome to crank it to as many nodes as you want then split off a zone, and see what happens.

I'm becoming annoyed at the level of ignorance on display in this thread, so I'm sorry for the curt tone, you can't abstract your way to solving it, it's a fundamental limitation of data.

[0]: https://static.googleusercontent.com/media/research.google.c...

cryptonector · 2024-07-25T19:41:53 1721936513

Did you read TFA? Please, the point of TFA is NOT that you don't need distributed consensus algorithms, but that once you have them you don't need to worry about partitions, therefore you can have consistency, and you get "availability" in that the cloud "is always available", and if that sounds like cheating you have to realize that partitions in the cloud don't isolate clients outside the cloud, only servers inside the cloud, and if there's no quorum anywhere then you just treat that as "the cloud is down".

pyrale · 2024-07-26T08:00:21 1721980821

I suspect you never needed CAP in the first place, because even outside of cloud, you don't seem to pay a lot of attention about whether your service is up or not or how to increase that uptime.

In the same spirit, some businesses are fine with a single db, making backups every night and losing some data when an issue happens. It doesn't mean that people maintaining these systems get to tell others that distributed systems are essentially a non-problem.

robertlagrant · 2024-07-25T21:26:38 1721942798

> you get "availability" in that the cloud "is always available",

The cloud isn't what's being described as available. The service is available or not. The service is hosted on multiple computers that talk to each other. If they stop talking to each other, the service needs to either become unavailable (choosing consistency) or risk having not up to date data (choosing availability).

dijit · 2024-07-25T19:46:42 1721936802

I did, and it doesn't provide insight that didn't exist 15 years ago.

Cloud is always available? I have a bridge to sell you.

kstrauser · 2024-07-25T18:31:45 1721932305

That presumes you can only have 1 partition at a time. That is not the case. You can optimize for good behavior when you end up with 2 disconnected graphs. It’s absolutely the case that a graph with N nodes can end up with N separate groups. Conceptually, imagine a single switch that connects all of them together and that switch dies.

You can engineer cloud services to be resilient. AWS has done a great job of that in my personal experience. But worst cases can still happen.

If you have an algorithm that runs in n*log(n) most of the time but 2^n sometimes, it can still be super useful. You have to prepare for the bad case though. If you say it’s been running great so far and we don’t worry about it, fine, but that doesn’t mean it can’t still blow up when it’s most inconvenient. It only means it hasn’t yet.

cryptonector · 2024-07-25T18:33:51 1721932431

Again, a partition that leaves you with no quorum anywhere is like the cloud being down.

robertlagrant · 2024-07-25T21:29:11 1721942951

> Again, a partition that leaves you with no quorum anywhere is like the cloud being down.

What does this mean? What is the cloud being down? The cloud isn't a single thing; it's lots of computers in lots of data centres.

NavinF · 2024-07-25T22:00:31 1721944831

It means there will be an article on the front page declaring the outage

pessimizer · 2024-07-25T19:26:15 1721935575

A partition that leaves you with no quorum anywhere is like the cloud being down if you value consistency over availability.

cryptonector · 2024-07-25T19:45:03 1721936703

Yes, but again, TFA says that you will trust the cloud will be up long enough that you would indeed prefer consistency.

My goodness. So many commenters here today are completely ignoring TFA and giving lessons on CAP -- lessons that are correct, but which completely ignore that TFA is really arguing that clouds move the needle hard in one direction when considering the CAP trade-offs, to the point that "the CAP theorem is irrelevant for cloud systems" (which was this post's original title, though not TFA's).

pyrale · 2024-07-26T08:12:23 1721981543

The issue people are pointing is that the author conflates "Cloud is good enough for my needs" with "CAP is no longer relevant".The point is especially annoying considering that the author works at a cloud vendor and not a cloud user.

Spivak · 2024-07-26T00:41:16 1721954476

Huh? You can create partitions in ways other than issues with the underlying network. You can mess up firewall rules for example and find yourself with no quorum.

bitwalker · 2024-07-25T19:45:28 1721936728

Those aren't the only failure modes - you can have two sets of servers partitioned from one another (in two different data centers), both of which are reachable by clients. Do you allow those partitions to remain available, or do you stop accepting client connections until the partition heals? The "right" choice depends entirely on the application.

cryptonector · 2024-07-25T20:09:29 1721938169

In TFA's world-view clients will not talk to isolated servers.

thayne · 2024-07-25T21:04:44 1721941484

So that would be choosing consistency over availability.

cryptonector · 2024-07-26T03:29:22 1721964562

Yes. That's the point, that cloud changes the trade-off such that consistency becomes worthwhile.

ithkuil · 2024-07-26T08:26:25 1721982385

Indeed and this is a very subtle point that I found very hard to explain when talking about availability in the event of network partitions.

You have to consider the system as a whole including external clients since the definition of the availability of the entire system ultimately depends on the point of view of the end users of the system.

AtlasBarfed · 2024-07-26T01:40:34 1721958034

Quorum just means you are a CP system rather than an AP system.

imtringued · 2024-07-26T08:04:59 1721981099

If you have a partition, then you have to wait until the partition resolves if you want consistency. Otherwise the other partition may receive an update that you are unaware of, that invalidates your consistency constraints.

The author of the article doesn't actually understand the CAP theorem.

candiddevmike · 2024-07-25T15:44:45 1721922285

How do you setup a multi-region RDS with multi-region write?

kstrauser · 2024-07-25T16:34:59 1721925299

I think that's a thing? It's been a while since I've looked. Even if not, multi-AZ replication with failover is a standard setup and it has the same issues. Suppose you have some frontend servers and an RDS primary instance in us-west-2a and other frontend servers and an RDS replica in us-west-2b. The link between AZs goes down. For simplicity, pretend a failover doesn't happen. (It wouldn't matter if it did. All that would change here is the names of the AZs.)

Do you accept writes in us-west-2a knowing the ones in 2b can't see them? Do you serve reads in 2b knowing they might be showing old information? Or do you shut down 2b altogether and limp along at half capacity in only 2a? What if the problem is that 2a becomes inaccessible to the Internet so that you can no longer use a load balancer to route requests to it? What if the writes in 2a hadn't fully replicated to 2b before the partition?

You can probably answer those for any given business scenario, but the point is that you have to decide them and you can't outsource it to RDS. Some use cases prioritize availability above all else, using eventual consistency to work out the details once connectivity's restored. Others demand consistency above all else, and would rather be down than risk giving out wrong answers. No cloud host can decide what's write for you. The CAP theorem is extremely freaking relevant to anyone using more than one AZ or one region.

(I know you weren't claiming otherwise, just taking a chance to say why cross-AZ still has the same issues as cross-region, as I have heard well meaning people say they were somehow different. AWS does a great job of keeping things running well to the point that it's news when they don't. Things still happen though. Same for Azure, GCP, and any other cloud offering. However flawless their design and execution, if a volcano erupts in the wrong place, there's gonna be a partition.)

mordae · 2024-07-25T15:34:27 1721921667

You wish.

> DNS, multi-cast, or some other mechanism directs them towards a healthy load balancer on the healthy side of the partition

Incidentally that's where CAP makes it's appearance and bites your ass.

No amount of VRRP, UCARP wishful thinking can guarantee a conclusion on what partition is "correct" in presence of a network partition between load balancer nodes.

Also, who determines where to point the DNS? A single point of failure VPS? Or perhaps a group of distributed machines voting? Yeah.

You still need to perform the analysis. It's just that some cloud providers offer the distributed voting clusters as a service and take care of the DNS and load balancer switchover for you.

And that's still not enough, because you might not want to allow stragglers write to orphan databases before the whole network fencing kicks in.

hinkley · 2024-07-25T16:30:46 1721925046

The idea that someone in a partition can get around it by clever routing tricks indicates the author has some “new” definition of partition.

Partition isn’t one ethernet cable going bad. It’s all the ethernet cables going bad. Redundant network providers for your data center to handle the idiot with the backhoe isn’t surviving partition it’s preventing it in the first place.

toast0 · 2024-07-25T17:43:46 1721929426

You really do need to consider a full partition, and a partial partition. Both happen.

It's most common that you have a full partition when nobody can talk to node A, because it's actually offline. And sometimes you've got a dead uplink for a switch with a couple of nodes behind it that can talk to each other.

But partial partitions are really common too. If Node A can talk (bidirectionally) to B and C, but B and C can each only talk to A, you could do something with clever routing tricks. If you have a large enough network, you see this kind of thing all the time, and it's tempting to consider routing tricks. IMHO, it's a lot more realistic to just have a fast feedback between monitoring the network and the people who can go out and clean the ends on the fiber on the switches and routers and what not. The internet as exists is the product of smart people doing routing tricks on a global basis; it's not universally optimal --- you could do better in specific cases all the time, but it's really easy to identify these as a human when something is broken; actually doing probing for host routing is a big exercise for typically marginal benefits. Magic host routing won't help when a backhoe (or directional drill or boat anchor) has determined your redundant fiber paths are all physically present in the same hole though.

nijave · 2024-07-26T22:56:47 1722034607

I've seen partitions in Postgres clusters where a node gets overloaded to the point it's not responding to health checks and the cluster management software performs a failover. Since the previous master can't be reached but is still running, it accepts writes even after a new replica is promoted.

I've seen similar things happen in other software under load as well (which can easily cause cascading failures, too)

wongarsu · 2024-07-25T16:58:14 1721926694

A somewhat plausible scenario that's easy to illustrate would be if you have a DC in the US and one in New Zealand, and all 7 subsea cables connecting New Zealand to the internet go dark.

Something like that happened just this year to 13 African nations [1]

1: https://www.techopedia.com/news/when-africa-lost-internet-ex...

chronid · 2024-07-26T15:18:46 1722007126

This is not really correct, and assumes the state of a cloud service (let's say load balancing) is binary.

In my experience it's not. The cloud will glitch. The load balancing algo will break subtly for your workload. Your traffic will get blackholed for no apparent reason. I spent a week trying to convince a cloud provider they fucked up (the time it took for them to give us someone who could run the appropriate tcpdump) once. There was no global outage.

It's on you to determine if this is important for you or not in your case, but you will need to mitigate it above a certain SLA threshold requirement. It's far for consigning CAP to history or a curiosity, which is like saying you will never have network issues if you use "serverless" stuff.

Not talking about anyone in particular, but sometimes I feel people building and using the cloud reach hubris level of surety in their systems - I worked both sides of the fence, and I know the long tails of fuckups that impact customers though...

nijave · 2024-07-26T23:00:01 1722034801

I remember an article--I think from Gitlab--about random latency and out of order packet issues they were seeing on GCP. Turns out it only happened under a certain amount of load since GCP's routing algorithm uses a different network if traffic is under a certain amount and switches to a more distributed mode when it crosses a threshold.

In addition, sometimes you land on a bad piece of hardware but it's not bad enough to trigger the provider's monitoring. Few months ago we had a bad EC2 instance whose network would drop every 2 hours causing a bunch of random errors before recovering for a little while.

hot_gril · 2024-07-25T18:20:13 1721931613

Could a partition just be the machine hosting one of your DBs going down? When it comes back up, it still has its data, but it's missed out on what happened in the meantime.

hinkley · 2024-07-25T22:25:48 1721946348

Your question was answered by someone else but you've triggered a different form of dread.

We lost a preprod database that held up dev and QA for half a day. To bring the server room up to code they moved the outlets, and when they went to move the power cords, someone thought they could get away with relying on the UPS power during the jiggery. Turned out the UPS that was reporting all green only actually had about 8 seconds of standby power in it, which was a little longer than the IT person needed to untangle the cables.

So in the end we were just fucked on a different day than we would have been if we had waited for the next hiccup in the city power.

I mention this because if you've split your cluster between three racks and someone manages to power down an entire rack, you're dangerously close to not having quorum and definitely close to not having sufficient throughput for both consumer traffic and rebuilding/resyncing/resilvering.

It's a slow motion rock slide that is a cluster that was not sized correctly for user traffic plus recovery traffic as the entire thing limps toward a full on outage when just one more machine needs to be rebooted, because, for instance, your team decided that restart a machine every n/48 hours, to deal with a slow leak is just fine since you're using a consensus protocol that will hide the reboots. Maybe rock slide is the wrong word. It's a train wreck. Once it starts you can't stop it, you can only watch and fret.

robertlagrant · 2024-07-25T21:31:08 1721943068

Not really - if a database stops you fundamentally don't have availability. CAP is specific to network partitions because it wants to grapple with what happens when clients can talk to nodes in your service, but the nodes can't talk to each other.

imtringued · 2024-07-26T08:17:04 1721981824

The thing is, each ethernet cable is a temporal partition. By adding more ethernet cables you're not making partitions less likely, you're making more of them. A distributed system is considered partitioned while the packets are in flight and have not yet reached their destination and come back with a response. This is true even if the system on the other side has not yet failed or the network equipment is functioning properly.

kstrauser · 2024-07-25T16:51:48 1721926308

Yep. CAP comes into play when all the mitigations fail. However clever, diligent, and well-executed a network setup, there's some scenario where 2 nodes can't communicate. It's perfectly fine to say we've made it so vastly unlikely that we accept the risk to just take the whole thing offline when that happens. The problem cannot be made to go away. It's not possible.

hot_gril · 2024-07-25T19:09:11 1721934551

For a cloud customer, the networking side might be as simple as "run a few replicas in different regions, and let the load balancer deal with it." Or only different AZs. The database problem is much harder to sign away as someone else's job.

cryptonector · 2024-07-25T17:01:02 1721926862

TFA doesn't say you don't need Paxos other such. It says that partitions that leave less than a quorum of servers isolated do not reduce availability, and that partitions that leave you w/o a quorum are not really a thing on the cloud. That last part might not be true, but to the extent that it is true then you can have availability and strong consistency.

skywhopper · 2024-07-25T17:12:19 1721927539

It’s not true, but if it were true then it’s true?

cryptonector · 2024-07-25T17:24:58 1721928298

No, more like: when the cloud is all down, yeah, you don't have availability, but so what, that's like all servers being down -- that's always a potential problem, but it's less likely on a cloud.

bankcust08385 · 2024-07-25T16:22:11 1721924531

Yep. You can't magic cloud away Consistency by ignoring it and YOLO hoping it will Do The Right Thing™.

doctorpangloss · 2024-07-25T15:56:21 1721922981

I don’t know. People spend a lot of time thinking about network partitions. What’s the difference between a network partition, and a network that is very slow? You could communicate via carrier pigeon. Practicably, you can easily communicate via people in different regions, who already “know” which partition is correct. Networks are never really partitioned; and, the choice of denominator when calculating their speed means they’re never really transferring at 0 bits per second either. Eventually data will start transferring again.

So a pretty simple application design can deal with all of this, and that’s the world we live in, and people deal with delays and move on with life, and they might bellyache about delays for the purpose of getting discounts from their vendors but they don’t really want to switch vendors. If you’re a bank in the highly competitive business of taking 2.95% fees (that should be 0.1%) on transactions, maybe this stuff matters. But like many things in life, like drug prices, that opportunity only exists in the US, it isn’t really relevant anywhere else in the world, and it’s certainly not a math problem or intrinsic as you’re making it sound. That’s just the mythology the Stripe and banks people have kind of labeled on top of their shit, which was Mongo at the end of the day.

01HNNWZ0MV43FF · 2024-07-25T16:14:39 1721924079

"A good driver sometimes misses a turn, a bad driver never misses a turn"

A good engineer knows that all real-time systems are turn-based, and the network is _always_ partitioned

smilliken · 2024-07-25T18:08:39 1721930919

This is exactly the right way to think about it. One should be happy when the network works, and not expect any packet to make it somewhere on any particular schedule. Ideally. In practice it's too expensive to plan this way for most systems, so we compensate by pretending the failure modes don't exist.

tptacek · 2024-07-25T18:26:40 1721932000

I don't know what the formalisms here are, but as someone intermittently (thankfully, at this point, rarely) responsible for some fairly large-scale distributed systems: the "network that is really slow" case is much scarier to me than the "total network partition". Routine partial failure is a big part of what makes distributed systems hard to reason about.

rubiquity · 2024-07-25T21:44:50 1721943890

Even more fun are asymmetric network degradations or partitions.

toast0 · 2024-07-25T16:47:03 1721926023

You've got to at least consider what is going to happen.

The whole point of the CAP theorem is that you can't have a one size fits all design. Network partitions exist, and it's not always a full partition where you have two (or more) distinct groups of interconnected nodes. Sometimes everybody can talk to everybody, except node A can't reach node B. It can even be unidirectional, where node A can't send to B, but node B can send to A. That's just how it is --- stuff breaks.

There's designs that highlight consistency in the face of partitions; if you must have the most recent write, you need a majority read or a source of truth --- if you can't contact that source of truth or a majority of copies in a reasonable time (or at all). And you can't confirm a write unless you can contact the source of truth or do a majority write. As a consequence, you're more likely to hit situations where you reach a healthy frontend, but can't do work because it can't reach a healthy backend; otoh, that's something you should plan for anyway.

There's designs that highlight availability in the face of partitions; if you must have a read and you can reach a node with the data, go for it. This can be extended to writes too, if you trust a node to persist data locally, you can use a journal entry system rather than a state update system, and reconcile when the partition ends. You'll lose the data if the node's storage fails before the partition ends, of course. And you may end up reconciling into an undesirable state; in banking, it's common to pick availability --- you want customers to be able to use their cards even when the bank systems are offline for periodic maintenance / close of day / close of month processing, or unexpected network issues --- but when the systems come back online, some customers will have total transaction amounts that would have been denied if the system was online. Or you can do a last state update wins too --- sometimes that's appropriate.

Of course, there's the underlying horror of all distributed systems. Information takes time to be delivered, so there's no way for a node to know the current status of anything; all information from remote nodes is delayed. This means a node never knows if it can contact another node, it only knows if it was recently able to or not. This also means unless you do some form of locking read, it's not unreasonable for the value you've read to be out of date before you receive it.

Then there's even further horrors in that even a single node is actually a distributed system. Although there's significantly less likelyhood of a network partition between cpu cores on a single chip.

mordae · 2024-07-25T16:56:46 1721926606

There is also one more level of horror people often forget. Sometimes the history unwinds itself. Some outages will require restoration from a backup and some amount of transactions may be lost. It does not happen often, but it may happen. Such situations may lead to e.g. sequentially allocated IDs being reused or originally valid pointers to records now point nowhere and so on.

doctorpangloss · 2024-07-25T17:14:48 1721927688

Everything you're saying is true. It's also true that you can ignore these things, have some problems with some customers, and if you're still good, they'll still shop with you or whatever, and life will go on.

I am really saying that I don't buy into the mythology that someone at AWS or Stripe or whatever knows more about the theoretical stuff than anyone else. It's a cloud of farts. Nobody needs these really detailed explanations. People do them because they're intellectually edifying and self-aggrandizing, not because they're useful.

mordae · 2024-07-25T16:51:56 1721926316

Suppose we all live under socialism and capitalist rules no longer apply. Only physics.

You run a vacation home exchange/booking site, which you run in 3 instances in America, Europe and Asia because base you found that people are happier when the site is snappy without those pesky 50ms delays.

Now suppose it's the middle of the night in one of those 3 regions, so no big deal, but the rollout of new version brings down the database that's holding the bookings for two hours before it's fixed. Yeah, it was just the simplest solution to have just a one database with the bookings, because you wouldn't have to worry about all that pesky CAP stuff.

But people then start to ask if it would be possible to book from regions that are not experiencing the outage while double-bookings would still not happen. So you give it a consideration and then you figure out a brilliant, easy solution. You just

ndriscoll · 2024-07-25T17:35:13 1721928913

Incidentally, I'm highly suspicious of the claim that those pesky 50ms delays matter much. Checking DOMContentLoaded timings on a couple sites:

Reddit: 1.31 s

Amazon: 1.51 s

Google: 577 ms

CNN: 671 ms

Facebook: 857 ms

Logging into Facebook: 8.37 s

As long as you have TLS termination close to the end user, and your proxy maintains connections to the backend so that extra round-trips aren't needed, the amount of time large popular sites take to load suggests that people wildly overstate how much anyone cares about a fraction of a blink of an eye. A lot of these sites have a ~20 ms time to connect for me. If it were so important, I'd expect to see page loads on popular sites take <100 ms (a lot of that being TCP+TLS handshake), not 800 ms or 8 seconds.

doctorpangloss · 2024-07-25T20:57:33 1721941053

It’s hard to say. It’s common sense that it doesn’t matter that much for rational outcomes. Maybe it matters a lot if you are selling something psychological - like in a sense, before Amazon, people with shopping addiction were poorly served, and maybe 50ms matters to them.

You can produce a document that says these load times matter - Google famously observed that SERP load times have a huge impact on a thing they were measuring. You will never convince those analytics people that the pesky 50ms doesn’t matter, because they operate at a scale where they could probably observe a way that it does.

The database people are usually sincere. But are the retail people? If you work for AWS, you’re working for a retailer. Like so what if saving 50ms makes more money off shopping addicts? The AWS people will never litigate this. But hey, they don’t want their kids using Juul right? They don’t want their kids buying Robux. Part of the mythology I hate about these AWS and Stripe posts is, “The only valid real world application of my abstract math is the real world application that pays my salary.” “The only application that we should have a strictly technical conversation about is my application.”

Nobody would care about CAP at Amazon if it didn’t extract more money from shopping addicts! AWS doesn’t exist without shopping addicts!

ndriscoll · 2024-07-25T22:53:21 1721948001

What I mean is that Amazon seems to think it doesn't matter that much if they're taking hundreds of ms (or over 1 s) to show me a basic page. The TCP+TLS handshake takes like 60 ms, plus another RTT for the request is 80 ms total. If it takes 10 ms to generate the page (in fact it seems to take over 100), that should still be under 100 ms total on a fresh connection. But instead they send me off to fetch scripts from multiple other subdomains (more TCP+TLS handshakes). That alone completely ruins that 50 ms savings.

So the observation is that apparently Amazon does not think 50 ms is very important. If they did, their page could be loading about 5-10x faster. Likewise with e.g. reddit; I don't know if that site has ever managed to load a page in under 1 s. New reddit is even worse. At one point new reddit was so slow that I could tap the URL bar on my phone, scroll to the left, and change www to old in less time than it took for the page to load. In that context, I find people talking about globally distributed systems/"data at the edge" to save 50 ms to be rather comical.

hot_gril · 2024-07-25T18:48:33 1721933313

and travel sites will be a lot slower for other reasons

bunderbunder · 2024-07-25T15:57:56 1721923076

I once lost an entire Christmas vacation to fixing up the damage caused when an Elasticsearch cluster running in AWS responded poorly to a network partition event and started producing results that ruined our users' day (and business records) in a "costing millions of dollars" kind of way.

It was a very old version of ES, and the specific behavior that led to the problem has been fixed for a long time now. But still, the fact that something like this can happen in a cloud deployment demonstrates that this article's advice rests on an egregiously simplistic perspective on the possible failure modes of distributed systems.

In particular, the major premise that intermittent connectivity is only a problem on internetworks is just plain wrong. Hubs and switches flake out. Loose wires get jiggled. Subnetworks get congested.

And if you're on the cloud, nobody even tries to pretend that they'll tell you when server and equipment maintenance is going to happen.

hinkley · 2024-07-25T16:34:56 1721925296

Twice I’ve had to take an Ethernet cable from an IT lead and throw it away. “It’s still good!” No it fucking isn’t. One of them compromised by pulling it out of the trash and cutting the ends off.

When there’s maintenance going on in the server room and a machine they promised not to touch starts having intermittent networking problems, it’s probably a shitty cable getting jostled. There is an entire generation of devs now that have had physical hardware abstracted away and aren’t learning the lessons.

Though my favorite stories are the cubicle neighbor who taps their foot leading to packet loss.

kstrauser · 2024-07-25T16:57:06 1721926626

I watched a colleague diagnosis a flaky cable. Then he immediately pulled out a pocket knife and sliced the cable in 2. He said he didn't ever even want to be tempted to trust it again.

That's been my model since then. I've never worked with a cable so expensive or irreplaceable that I'd want to take a chance on it a second time. I'm sure they exist. I haven't been around one.

hot_gril · 2024-07-25T18:07:42 1721930862

Lightning cable. Gotta mess with it until the iPhone charges. I'm not replacing it until it's tied into a blackwall hitch and still isn't charging.

devin · 2024-07-25T20:33:36 1721939616

It's been awhile since I've been in a data center, but fiber cables were usually the ones that seemed to have more staying power even when they started to flake. Maybe because some of them were long runs and they weren't cheap. You'd see some absolutely wrenched cable going into rack at a horrifying angle from the tray on 30ft run with greater regularity than I'd care to admit.

jackdoe · 2024-07-25T15:38:24 1721921904

When I design systems I just think about tiny traitor generals and their sneaky traitor messengers racing in the war, their clocks are broken, and some of them are deaf, blind or both.

CAP or no CAP, chaos will reign.

I think FLP (https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf) is better way to think about systems.

I think CAP is not as relevant in the cloud because the complexity is so high that nobody even knows what is going on, so the just C part, regardless of the other letters, is ridiculously difficult even on a single computer. A book can be written just to explain write(2)'s surprise attacks.

So you think you have guarantees whatever the designers said they have AP or CP, and yet.. the impossible will happen twice a day (and 3 times at night when its your on-call).

SomeHacker44 · 2024-07-25T15:49:48 1721922588

Nice! A paper by Michael Fischer of Multiflow fame. I have one of their computers in my garage. Thanks!

hot_gril · 2024-07-25T18:10:08 1721931008

I can think of a lot of cloud systems that have C. Businesses will rely on a single machine running a big Oracle DB. Microservice architectures will be eventually consistent as a whole, but individual services will again have a single DB on a single machine.

The single machine is a beastly distributed system in of itself, with multiple cores, CPUs, NUMA nodes, tiered caches, RAM, and disks. But when it comes down to two writers fighting over a row, it's going to consult some particular physical place in hardware for the lock.

killjoywashere · 2024-07-25T15:39:17 1721921957

The military lives in this world and will likely encourage people to continue thinking about it. Think about wearables on a submarine, as an example. Does the captain want to know his crew is fatigued, about to get sick, getting less exercise than they did on their last deployment? Yes. Can you talk to a cloud? No. Does the Admiral in Hawaii want to know those same answers about that boat, and every boat in the Group, eventually? Yes. For this situation, datacenter-aware databases are great. There are other solutions for other problems.

leetrout · 2024-07-25T15:48:10 1721922490

Yep.

For those unaware, military networks deal with disruptions, disconnections, intermittent connectivity, and low-bandwidth (DDIL).

https://www.usni.org/magazines/proceedings/sponsored/ddil-en...

rdtsc · 2024-07-25T17:17:59 1721927879

> The CAP Theorem is Irrelevant

Just sprinkle the magic "cloud" powder on your system and ignore all the theory.

https://ferd.ca/beating-the-cap-theorem-checklist.html

Let's see, let's pick some checkboxes.

(x) you pushed the actual problem to another layer of the system

(x) you're actually building an AP system

Spivak · 2024-07-26T04:04:29 1721966669

Specifically they pushed the problem to the load-balancer so they get to check this one too.

(x) your solution requires a central authority that cannot be unavailable

I'm amazed the author doesn't consider that the load-balancers in this situation are effectively acting as non-voting witnesses and that the whole system hinges on the lbs being able to correctly determine the healthy nodes. In a single-master setup the network partition that cuts the line between the lbs and the current master (but nothing else) is a fun one. Everything would be fine if you could promote a new master but who's gonna tell em'?

rdtsc · 2024-07-26T05:17:49 1721971069

That's a good point. It describes the situation better than my checklist.

hot_gril · 2024-07-25T18:39:07 1721932747

You're building a ??? system. Either CP or AP depending on what random values got put into some config.

xnorswap · 2024-07-25T15:37:19 1721921839

There's a better rebuttal(*) of CAP in Kleppmann's DDIA, under the title, "The unhelpful CAP theorem".

I won't plagiarize his text, instead the chapter references his blogpost, "Please stop calling databases CP or AP": https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

(*): rebuttal I think is the wrong word, but I couldn't think of better.

cowboylowrez · 2024-07-25T20:26:08 1721939168

Really good read, in fact all of these discussions are really great.

I liked the linearizable explanation, like when Alice knows the tournament outcome but Bob doesn't. A super extension to this would underscore how important this is, and the danger of Alice knowing the outcome but Bob not knowing, just extend the website a bit and make it a sports gambling website. A system under such a parition would allow Alice to make a "sure thing" bet against Bob, so the constraint should be that when Bob's view is stale, it does not take bets. But how does Bobs view know its stale? It has to query Alices view! Lots of mind games to play!

vmaurin · 2024-07-25T16:07:55 1721923675

Plot twist: in the article drawings, replica one and two are split by network, and it could fail.

The author seems to not understand what the meaning of the P in CAP

hinkley · 2024-07-25T16:37:31 1721925451

I missed your comment and just made the same comment with different words. He literally has a picture of a partition.

jvanderbot · 2024-07-25T16:11:04 1721923864

"Non failing node" does not mean "Non partitioned node", simple as that.

If you treat a partitioned node as "failed", then CAP does not apply. You've simply left it out cold with no read / write capability because you've designated a "quorum" as the in-group.

pyrale · 2024-07-25T16:01:41 1721923301

Someone else tok ownership of the problem for you and sells you their solution : "The theoretical issue is irrelevant to me".

Sure. Also, there's a long list of other things that are probably irrelevant to you. That is, until your provider fails and you need to understand the situation in order to provide a workaround.

And slapping "load-balancers" everywhere on your schema is not really a solution, because load-balancers themselves are a distributed system with a state and are subject to CAP, as presented in the schema.

> DNS, multi-cast, or some other mechanism directs them towards a healthy load balancer on the healthy side of the partition.

"Somehow, something somewhere will fix my shit hopefully". Also, as a sidenote, a few friends would angrily shake their "it's always DNS" cup reading this.

edit: reading the rest of the blog and author's bio, I'm unsure whether the author is genuinely mistaken, or whether they're advertising their employer's product.

justinsaccount · 2024-07-25T15:27:59 1721921279

> None of the clients need to be aware that a network partition exists (except a small number who may see their connection to the bad side drop, and be replaced by a connection to the good side).

What a convenient world where the client is not affected by the network partition.

tristor · 2024-07-25T16:20:24 1721924424

As someone who's worked extensively on distributed systems, including at a cloud provider, after reading this I think the author doesn't actually understand the CAP theorem or the two generals problem. Their conclusions are essentially utterly incorrect.

kristjansson · 2024-07-25T16:07:56 1721923676

Many things can be solved by the SEP Field[0]

[0]: https://en.wikipedia.org/wiki/Somebody_else's_problem#Dougla...

ivan_gammel · 2024-07-25T15:26:13 1721921173

The CAP theorem is quantum mechanics of software with C*A = O(1) in theory, similarly to uncertainty principle, but in many use cases this value is so small that "classical" expectations of both C and A are fine.

mrkeen · 2024-07-25T15:39:19 1721921959

> In practice, the redundant nature of connectivity and ability to use routing mechanisms to send clients to the healthy side of partitions

Iow: You can have CAP as long as you can communicate across "partitions".

crote · 2024-07-25T16:26:37 1721924797

More precisely: partitions in the graph theory sense rarely (if ever) happen. In practice it looks more like a complete graph losing some edges.

Let's say you have servers in DCs A and B, and clients at ISPs C and D. Normally servers at A and B can communicate, and clients at both C and D can each reach servers at A and B.

If connectivity between DCs A and B goes down there is technically no partition, because connectivity between the clients and each DC is still working. The servers at one of the DCs can just say "go away, I lost my quorum", and the clients will connect to the other DC. This is the most likely scenario, and in practice the only one you're really interested in solving.

If connectivity between ISP C and DC A goes down, there is no partition because they can just connect to DC B.

If all connectivity for ISP C goes down, that is Not Your Problem.

If all connectivity for DC A goes down, it doesn't matter because it no longer has clients writing to it, and they've all reconnected to DC B.

To have a partition, you'd need to lose the link between DCs A and B, and lose the link from ISP C to DC B, and lose the link from ISP D to DC A. This leaves the clients from ISP C connected only to DC A, and the clients from ISP D connected only to DC B - while at the same time being unable to reconcile between DCs A and B. Is this theoretically possible? Yes, of course. Does this happen in practice - let alone in a close-enough split that you could reasonably be expected to serve both sides? No, not really.

hinkley · 2024-07-25T16:49:15 1721926155

Partition tends to be a problem between peers, where observers can see they aren’t talking to each other.

In the early days that might have two machines we walked up to seeing different results, now we talk to everything over the internet. We generally mean partition when one vlan is having trouble. One ethernet card is dying or has a bad cable. If the entire thing becomes unreachable we just call it an outage.

justinsaccount · 2024-07-25T16:39:40 1721925580

  DCA: north america
  DCB: south america
  ISP C: north american ISP
  ISP D: south american ISP

A major fiber link between north and south america is cut, disrupting all connectivity between the two areas.

  "lose the link between DCs A and B": check.
  "lose the link from ISP D to DC A.": check.

replace north and south america with any two disparate locations.

PaulHoule · 2024-07-25T18:34:49 1721932489

So glad to see that the CAP "theorem" is being recognized as a harmful selfish meme like Fielding's REST paper with a deadly seductive power against the overly pedantic.

senorrib · 2024-07-25T15:30:01 1721921401

I think every couple of months there's yet another article saying the CAP theorem is irrelevant. The problem with these is that they ignore the fact that CAP theorem isn't a guide, a framework or anything else.

It's simply the formalization of a fact, and whether or not that fact is *important* (although still a fact) depends on the actual use case. Hell, it applies even to services within the same memory space, although obviously the probability of losing any of the three is orders of magnitude less than on a network.

Can we please move on?

da_chicken · 2024-07-25T16:33:42 1721925222

> Can we please move on?

Doubtful. If there's one thing that new developers have always insisted upon doing it's telling the networking and data store engineers that their foundational limitations are wrong. It seems that everyone has to learn that computers aren't magic through misunderstanding and experience.

hinkley · 2024-07-25T16:53:19 1721926399

Way way back, I had a coworker who aspired to be a hacker get really mad at me because he was just sure that spinning rust was “digital” while I was asserting to him that there’s a DAC in there and that’s how data recovery services work. They crack open the drive and read the magnetic domains to determine not just what the current bytes are but potentially what the previous ones were too.

They’re digital! It’s ones and zeroes! Dude, it’s a fucking magnet. A bunch of them in fact.

wrs · 2024-07-25T18:40:19 1721932819

Yes, any real digital device is relying on a hopefully well-separated bimodal distribution of something analog.

fractalic · 2024-07-26T05:19:02 1721971142

Hmm this article seems misleading. I suppose it's trying to make the point that application designers usually don't need to think too hard about it, because it's already being addressed by a quorum consensus protocol implemented by someone else. This is a bit of a tautology though; the author seems to be saying 'assume you have a solution to CAP theorem -- now isn't it silly to worry about CAP theorem?'.

One of the fundamental assumptions of CAP theorem is that you can't tell whether or not you have a partition. If you have an oracle that can instantaneously tell you the state of every subsystem, then yeah, CAP is pointless.

But if one of your DBs is connected, reporting itself as alive, and throwing all its writes into /dev/null, you won't be able to route traffic to a quorum of healthy instances because it's not possible to be certain that they're all healthy.

This is what CAP theorem is about: managing data in a distributed system where the status of any given system is fundamentally unknowable because of the Two Generals' Problem (https://en.wikipedia.org/wiki/Two_Generals'_Problem)

In many cases in Cloud though, we can skip that technical stuff and design systems as if we really _did_ have an oracle that could instantaneously and perfectly tell us the state of the system, and things will typically work fine.

rubiquity · 2024-07-25T21:32:26 1721943146

The point trying to be made is that with nimble infrastructure the A in CAP can be designed around to such a small amount you may as well be a CP system unless you have a really good reason to go after that 0.005% of availability. Not being CP means sacrificing the wonderful benefits that being consistent (linearizability, sequential consistency, strict serializibility) make possible. It's hard to disagree with that sentiment, and is likely why the Local First ideology is centered on data ownership rather than that extra 0.0005 ounces of availability. Once availability is no longer the center of attention the design space can be focused on durability or latency: how many copies to read/write before acking.

Unfortunately the point is lost because of the usage of the word "cloud", a somewhat contrived example of solving problems by reconfiguring load balancers (in the real world certain outages might not let you reconfigure!), and missing empathy that you can't tell people not to care about how the semantics that thinking about, or not thinking about, availability imposes on the correctness of their applications.

As for the usage of the word cloud: I don't know when a set of machines becomes a cloud. Is it the APIs for management? Or when you have two or more implementations of consensus running on the set of machines?

lupire · 2024-07-25T15:52:35 1721922755

He's saying that you don't need Partition Tolerance because network is never actually Partitioned. This is exactly why the Internet and the US Interstate Highway system were invented in the first place.

Or he's saying you don't need Consistency because your system isn't actually distributed; it's just a centralized system with hot backups.

It's unclear what he's trying to say.

No idea why he wrote the blog post. It doesn't increase my confidence in the engineering equality of his employer AWS

cryptonector · 2024-07-25T16:55:38 1721926538

> If the partition extended to the whole big internet that clients are on, this wouldn’t work. But they typically don’t.

This is the key, that network partitions either keep some clients from accessing any servers, or they keep some servers from talking to each other. The former case is uninteresting because nothing can be done server-side about it. The latter is interesting and we can fix it with load balancers.

This conflicts with the picture painted earlier in TFA where the unhappy client is somehow stuck with the unhappy server, but let's consider that just didactic.

We can also not use load balancers but have the clients talk to all the servers they can reach, when we trust the clients to behave correctly. Some architectures do this, like Lustre, which is why I mention it.

I see several comments here that seem to take TFA as saying that distributed consensus algorithms/protocols are not needed, but TFA does not say that. TFA says you can have consistency, availability, and partition tolerance because network partitions between servers typically don't extend to clients, and you can have enough servers to maintain quorum for all clients (if a quorum is not available it's as if the whole cloud is down, then it's not available to any clients). That is a very reasonable assertion, IMO.

mrkeen · 2024-07-25T21:49:01 1721944141

> TFA says you can have consistency, availability, and partition tolerance because network partitions between servers typically don't extend to clients, and you can have enough servers to maintain quorum for all clients

It's not "partition tolerance" if you declare that partitions typically don't happen.

linuxhansl · 2024-07-26T06:42:20 1721976140

So basically this is saying that the CAP theorem is irrelevant because a partition is not really have a partition (since the load balancer still can reach everybody). Hmm...

I agree that in modern data centers the CAP theorem is essentially irrelevant for intra-DC services, due the uptime and redundancy of networking H/W (making a partition less likely than other systemic failures).

Across DCs I'll claim it is still absolutely relevant.

hot_gril · 2024-07-25T18:29:16 1721932156

The only concrete solution the article proposes that I can think of: Spanner uses quorum to maintain availability and consistency. Your "master" is TrueTime, which is considered reliable enough. You have replicated app backends. If this isn't too generous, let's also say the cloud handles load balancing well enough. CAP isn't violated, but you might say the user no longer worries about it.

Most databases don't work like Spanner, and Spanner has its downsides, two of them being cost and performance. So most of the time, you're using a traditional DB with maybe a RW replica, which will sacrifice significant consistency or availability depending on whether you choose sync or async mode. And you're back to worrying about CAP.

hot_gril · 2024-07-25T22:40:42 1721947242

Ok, you could also have multiple synchronous Postgres standbies in a quorum setup so that it's tolerable to lose one. Supposedly Amazon Aurora does that. Comes with costs and downsides too.

skywhopper · 2024-07-25T17:09:05 1721927345

Weird article. Different users have different priorities and that’s what the CAP theorem expresses. The article also pretends that there’s a magic “load balancer” in the cloud that always works and also knows which segment of a partitioned network is the “correct” one (one of the points of CAP is that there’s not necessarily a “correct” side), and that no users will ever be on the “wrong” side of the partition. And not only that but all replicas see the exact same network partition. None of this is reality.

But the gist, I guess, is that for most applications it’s not actually that important, and that’s probably true. But when it is important, “the cloud” is not going to save you.

jorblumesea · 2024-07-25T16:40:20 1721925620

CAP was never designed as an end all template you blindly apply to large scale systems. Think of it more as a mental starting pointing, that systems have these trade offs you need to consider. Each system you integrate has complex and nuanced requirements that don't neatly fall into clean buckets.

As always Kleppmann has a great and deep answer for this.

https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

bjornsing · 2024-07-25T15:37:17 1721921837

I suspect the CAP theorem factored into the design of these cloud architectures, in such a way that it now seems irrelevant. But it probably was relevant in preventing a lot of other more complex designs.

wmf · 2024-07-25T16:25:28 1721924728

For example, having 3 AZs allows Paxos/Raft voting.

mrkeen · 2024-07-25T21:51:44 1721944304

Paxos being CP.

KaiserPro · 2024-07-25T16:40:37 1721925637

I kinda see what the author is getting at, but I don't buy the argument.

However, in the example with the network partition, it relies on proper monitoring to work out if the DB its attached to is currently in partition.

managing reads is a piece of piss, mostly. Its when you need propagate write to the rest of the DB system, thats where stuff gets hairy.

Now, most places can run from a single DB, especially as disks are fucking fast now. so CAP is never really that much of a problem. However when you go multi-region, thats when it gets interesting.

thayne · 2024-07-25T21:11:11 1721941871

This only addresses one kind of partition.

What if your servers can't talk to each other, but clients can?

What if clients can't connect to any of your servers?

What if there are multiple partitons, and none of them have a quorum?

Also, changing the routing isn't instantaneous, so you will have some period of unavailability between when the partition happens, and when the client is redirected to the partition with the quorum.

ibash · 2024-07-25T16:13:00 1721923980

> if a quorum of replicas is available to the client, they can still get both strong consistency, and uncompromised availability.

Then it’s not a partition.

api · 2024-07-25T15:08:38 1721920118

This is just saying because the cloud system hides the implications of the theorem from you, it's not relevant.

I suppose it's kinda true in the sense that how to operate a power plant is not relevant when I turn on my lights.

hinkley · 2024-07-25T15:21:56 1721920916

We pay a lot of money to Amazon to not think about the 8 Fallacies as well.

You still get consequences for ignoring them, but they show up as burn rate.

deathanatos · 2024-07-25T20:21:08 1721938868

And if you actually use cloud systems long enough, you'll see violations of the "C" in CAP in the form of results that don't make any sense. So even in practice… it's good to understand, because it can and does matter.

remram · 2024-07-25T22:02:45 1721944965

This article assumes P(artitions) don't happen, and then concludes you can have both C and A. Congrats, that's the CAP theorem.

hinkley · 2024-07-25T16:26:57 1721924817

> The formalized CAP theorem would call this system unavailable, based on their definition of availability:

Umm, no? That’s a picture of a partition. The partition is not able to make progress because the system is not partition tolerant. If it did it wouldn’t be consistent. It’s still available.

throw0101c · 2024-07-25T16:27:25 1721924845