The log segments committed by FaunaDB contain batches of transactions, which mea...

jamesblonde · on Oct 19, 2018

We built HopsFS on NDB, not CalvinDB, because we needed performance for cross-partition transactions. Some workloads need it. In the filesystem workload, when FS path components are normalized and stored on different partitions, practically all transactions cross partitions. So if you serialize them, then writing a file in /home/jchan will block writing a file in /home/jim. This is what most distributed filesystems already do - have a global write lock on the filesystem metadata. I like having the freedom to build my the concurrency model on top of READ_COMMITTED, as i can squeeze out 16X performance improvemnts by doing so (ref: https://www.usenix.org/conference/fast17/technical-sessions/... )

evanweaver · on Oct 19, 2018

Sure, if you don't care about serializability, you don't need to pay consensus costs to enforce it, but you will be subject to read and write skew.

notacoward · on Oct 21, 2018

> This is what most distributed filesystems already do - have a global write lock on the filesystem metadata

Absolutely untrue. Stop lying about other systems.

jamesblonde · on Oct 19, 2018

I should add that when you don't serialize cross-partition transactions, you have the hard problem of system recovery - transaction coordinators need to reach consensus on a consistent set of transactions to recover. Here's NDB's protocol for doing so: NDB’s solution to this problem was only published this year in Mikael Ronstrom's book on MySQL Cluster : https://drive.google.com/file/d/1gAYQPrWCTEhgxP8dQ8XLwMrwZPc...

jamesblonde · on Oct 19, 2018

I should add that it is not correct to say 'our throughput is constrained not by our consensus protocol'. A trivial example would be a workload of transactions, where each transaction has at least two non-conflicting writes on different partitions. FaunaDB will serialize those transactions, and you will bottleneck on the consensus protocol - compared to NDB.

ummonk · on Oct 21, 2018

What he's saying is that the throughput number they mention is constrained by transaction conflicts. The limit for non-conflicting transactions allowed by the consensus protocol is no doubt much higher.

mcguire · on Oct 19, 2018

Can you partition your batch transactions, so that all up to the conflict succeed?

jchanimal · on Oct 19, 2018

My understanding is yes, we commit to the log in batches, but abort conflicts at the transaction level. So only the conflicts have to suffer the retry loop, everything else is durably committed.