Replicas are master/slave, and continues snapshotting is asynchronous. On top of...

sigil · on July 12, 2011

What happens when master fails? jganetsk mentioned Paxos, which you could use to reliably elect a new master. If I were evaluating your tech for usage at scale, I'd be pretty interested in failure modes, and pretty skeptical of anything that didn't employ distributed consensus. There may be other distributed consensus algorithms out there (one was apparently discovered in fruit fly cells [1]), but Paxos is the only one I've seen so far with a proof of eventual consistency.

An opensource implementation of Paxos: [2]

[1] http://www.kurzweilai.net/fruit-fly-nervous-system-provides-...

[2] https://github.com/ha/doozer

nikita · on July 12, 2011

Thanks, sigil. We are certainly aware of various options. It's either Paxos, or a simpler failover mechanism used by traditional dbs, such as SQL Server or Oracle on the per node basic based on replication.

gfaremil · on July 12, 2011

Even if you put write ahead logs on different I/O subsystems the lock needed to ensure consistency will probably be a bottleneck (and crappy Linux implementation asynch IO on multi processor system does not help). How do you guys organize data so it is consistent and lock is not a bootleneck?

ZoFreX · on July 12, 2011

You don't necessarily need to write log entries in the order they are committed, nor do you need to store all of the entries in a single log, you could shard the log writing too - as long as you maintained, somewhere, the order information. So yes, you would still need to lock and sort out the order somewhere, but you already have to if you're ACID any way.

nikita · on July 12, 2011

Right on the money. We every log record contains LSN (log sequence number) that allows us to reconstitute the order. Then as at the recovery time you need to merge sort by LSN from the shared log servers

ZoFreX · on July 13, 2011

When you say you merge sort from the log servers, do you mean the logs on a given server are written & stored in-order?