Demystifying Databases: Correctness Anomalies Under Serializable Isolation

thekhatribharat · on June 30, 2019

Shameless Plug (a summary of current NoSQL/NewSQL market and each's prominent features) - https://medium.com/open-factory/nosql-newsql-a-smorgasboard-...

gigatexal · on June 29, 2019

This seems like a veiled salvo towards the folks at CockroachDB

ryanworl · on June 29, 2019

For those wondering, since CockroachDB is not mentioned by name here, the post [1] from the same author on his own site does explicitly mention CockroachDB.

"However, (unlike Google Spanner) CockroachDB does not wait for the maximum clock skew bound to pass before committing a transaction. Therefore, it is possible in CockroachDB for a transaction to commit, and a later transaction to come along (that writes data to a different partition), that was caused by the earlier one (that started after the earlier one finished), and still receive an earlier timestamp than the earlier transaction."

[1] https://dbmsmusings.blogspot.com/2019/06/correctness-anomali...

jasonwatkinspdx · on June 29, 2019

This is a bit puzzling. It seems Abadi is unaware cockroachdb uses HLC, and does capture causality across partitions.

freels · on June 29, 2019

Cockroach Labs themselves freely admit CRDB allows for causal reverse anomalies [1], and Kyle reproduced it in his Jepsen analysis [2].

HLCs don't prevent causal reverse, because they don’t guarantee message exchanges synchronously happen across nodes between transactions. Which most HLC implementations piggy back on to properly advance the HLC on each node.

1: https://www.cockroachlabs.com/blog/consistency-model/

2: https://jepsen.io/analyses/cockroachdb-beta-20160829

jasonwatkinspdx · on June 29, 2019

Ah, so they're not using the waits. It wouldn't be that surprising if in practice they're just too slow to be practical.

Do you have anything I can read on the rest of what you're saying? It contradicts the HLC paper.

freels · on June 29, 2019

I don’t know of an article describing it offhand. The issue arises where each transaction occurs on a different node or set of nodes, and the causal link between them is external to the dbms. In this case, it’s possible for each transaction to independently commit without the necessary message exchanges to ensure their HLC based timestamps correspond with their causal order. In other words since the client doesn’t participate in the HLC scheme, it breaks the causal chain and can allow an HLC-based system to assign out of order timestamps.

This is hinted at in the HLC paper but it’s not very clear: Section 2 limits the definition of “happens before” to only apply to events which occur on the same node or after a message is received from another node.

freels · on June 29, 2019

Isn’t that just describing reintroducing waiting out the uncertainty window in order gain back external consistency? I’m not sure that ends up being substantially different from TrueTime.

jasonwatkinspdx · on July 1, 2019

Yeah, that's why I was asking. I've seen some various comments about the details of this but nothing definitive. I'm hoping when I have the mental bandwidth to sit down and go through the TLA proofs of HLC I'll fully grok it.

One immediate difference between TrueTime and HLC springs to mind though: TT knows the clock ambiguity when the timestamp was generated, HLC does not. So in practice I think TT may see much shorter wait intervals, than HLC which would have to conservative wait out the maximum clock drift estimate allowed.

A skim through the cdb source is consistent with this. It looks like when they hit a read/write conflict where the write has a non-zero logical component in it's HLC stamp, then they wait out any remaining portion of the ambiguity window. Since the max offset is around 400ms, that seems like it could hurt pretty bad under contended writes.

I think HLC would get more interesting if the stamps were extended with the ambiguity interval at stamp time.

jasonwatkinspdx · on June 29, 2019

Section 6.3.

gigatexal · on June 30, 2019

To be fair this test was from a long time ago and I believe they fixed the things found by the Jepsen folks.

The latest version is 19.1.2

freels · on June 30, 2019

Yes, I was not trying to imply the bugs Jepsen found haven’t been fixed.

However the possibility of causal reverse is a characteristic of CRDB’s architecture, so you can still observe it in the latest version.

gigatexal · on June 30, 2019

I think they solve this with throwing an error and having the caller issue a retry and subsequent retries are issued with a higher priority so that the retried query is more likely to succeed. Whether or not this is good or bad or ideal or not I don't know.

quizotic · on June 29, 2019

This is Dan Abadi we're talking about. More importantly, the points he makes are important, clear and true. If that causes heartburn for Vendor A and pride for Vendor B, that's secondary. The primary goal is to help users of distributed database systems understand the kind of trouble they can encounter with anything less than strict serializability.

NB: there are important applications where correctness in specific situations is not paramount. Double-click and ad-serving companies in general are more concerned about speed and throughput and are generally willing to have approximate correctness. Strict serializability isn't a universal value, but it's good to know about when you DO care about the kinds of anomalies Dan illustrates.

AtlasBarfed · on June 30, 2019

Fauna is a CP system, correct?

thekhatribharat · on June 30, 2019

Ref: https://martin.kleppmann.com/2015/05/11/please-stop-calling-...