Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Replicas are master/slave, and continues snapshotting is asynchronous. On top of that, we can write a write ahead transaction log. If you put snapshotting and write ahead logs on different I/O subsystems you only produce sequentual writes, which guarantees that you get the most of your IO. Of course, that's on a single box.

For our distributed cluster we have the option of writing the write ahead log to the network "log servers" so you can do it even faster. In this case once ones K out of N log servers receive a log record we acknowledge the transaction. Then eventually the log servers flush it to disk. In this case you are able to write very quickly (you can spin off as many log servers as you like), you store your data in memory in one copy and you're durable.



What happens when master fails? jganetsk mentioned Paxos, which you could use to reliably elect a new master. If I were evaluating your tech for usage at scale, I'd be pretty interested in failure modes, and pretty skeptical of anything that didn't employ distributed consensus. There may be other distributed consensus algorithms out there (one was apparently discovered in fruit fly cells [1]), but Paxos is the only one I've seen so far with a proof of eventual consistency.

An opensource implementation of Paxos: [2]

[1] http://www.kurzweilai.net/fruit-fly-nervous-system-provides-...

[2] https://github.com/ha/doozer


Thanks, sigil. We are certainly aware of various options. It's either Paxos, or a simpler failover mechanism used by traditional dbs, such as SQL Server or Oracle on the per node basic based on replication.


Even if you put write ahead logs on different I/O subsystems the lock needed to ensure consistency will probably be a bottleneck (and crappy Linux implementation asynch IO on multi processor system does not help). How do you guys organize data so it is consistent and lock is not a bootleneck?


You don't necessarily need to write log entries in the order they are committed, nor do you need to store all of the entries in a single log, you could shard the log writing too - as long as you maintained, somewhere, the order information. So yes, you would still need to lock and sort out the order somewhere, but you already have to if you're ACID any way.


Right on the money. We every log record contains LSN (log sequence number) that allows us to reconstitute the order. Then as at the recovery time you need to merge sort by LSN from the shared log servers


When you say you merge sort from the log servers, do you mean the logs on a given server are written & stored in-order?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: