Sorry, I think either my fundamental understanding is off or I wasn't clear enou...

evanweaver · on Oct 22, 2018

> If you talk to a majority of nodes and ask "this is what we all have, right?"

You cannot know this, because the transaction replication is racy and not atomic--it may have applied to only one node while you are doing your read. Whether you see it or don't is luck. So you can have the following scenario (and in practice you will):

- TX commit begins

- TX replicated to node A

- Read from coordinator A' begins

- Read sees replica A (has tx) and replica B (does not)

- Read assumes A wins because of some kind of vector clock in the data value (choosing the older value doesn't make things better, just in case you are wondering)

- Read from coordinator B' begins

- Read sees B (no TX) and C (no TX)

- Read completes with stale value--serializability violation has occurred

- TX finishes replicating to B and C

This leaves aside write skew, torn transactions due to partitions, and all kinds of other problems.

hardwaresofton · on Oct 23, 2018

I'm super confused -- what you're describing isn't a quorum read/write scenario -- what do you mean by "a wins"? Also where is the prepare phase for 2 phrase commit/any consensus algo? Replica A shouldn't be reporting an unacknowledged transaction to coordinators -- the write hasn't been quorum acknowledged. TX is not considered committed until it reaches a majority of nodes. You are right that if you have a network partition you're in trouble, but that ends in lack of progress, not loss of serializability.

We must be talking about different things because I can't find any literature that has reached the conclusion that serializability is impossible in distributed transactions? Can you point me to that?

Also, do you have any thoughts on the literature[0] that contradicts what you're saying? I'm not an expert but unless I'm not misreading english serializability is possible with quorum reads and writes.

> In a new algorithm for maintaining replicated data, every copy of a replicated file is assigned some number of votes. Every transaction collects a read quorum of rvotes to read a file, and a write quorum of wvotes to write a file, such that r+w is greater than the total number of votes assigned to the file. This ensures that there is a non-null intersection between every read quorum and every write quorum. Version numbers make it possible to determine which copies are current. The reliability and performance characteristics of a replicated file can be controlled by appropriately choosing r, w, and the file's voting configuration. The algorithm guarantees serial consistency, admits temporary copies in a natural way by the introduction of copies with no votes, and has been implemented in the context of an application system called Violet.

Has this paper been refuted? In addition to this there's literally the whole section on distributed serializability[1].

[0]: https://dl.acm.org/citation.cfm?doid=800215.806583

[1]: https://en.wikipedia.org/wiki/Serializability#Distributed_se...

evanweaver · on Oct 29, 2018

> what you're describing isn't a quorum read/write scenario -- what do you mean by "a wins"? Also where is the prepare phase for 2 phrase commit/any consensus algo?

There is no consensus; that requires a leader system. The paper you link appears to require a multi-phase lock; the quorum itself does not guarantee serializability. Explicit preparation via quorum can guarantee serializability (but not strict serializability, I don't think), but cleanup of locks is a big performance problem in practice.

> Replica A shouldn't be reporting an unacknowledged transaction to coordinators -- the write hasn't been quorum acknowledged.

Acknowledged by who? The replicas can't block on the other replicas; they just tell the coordinator when they applied the write.

This is worth a blog post.