It's pretty easy to demonstrate data loss with MongoDB if you're doing replication, but it's the "normal" behaviour because MongoDB uses asynchronous replication and W=1 writes by default.
Set up a n=3 cluster and a client that writes data continously, like:
i = 0; while (true) { write(_id:i, data:i); getlastresult(); /* to make sure the client sends it */ printf(i); i++; }
Now kill the master. Another node will become the new master. Issue some more writes to make sure the logs of the new and the old master diverge. Now bring back the old master, which will become a secondary, and it will say something along the lines of "finding common oplog point", and it will discard the writes that it had that were not copied to the other nodes before it was killed.
You can verify all this by looking at the i's that were acknowledged by the old master and printed by the client. The last couple of them will be gone for good.
If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow.
As a single master system, MongoDB doesn't allow the data on different nodes to become inconsistent or go into conflict. The idea is that avoiding this prevents developers from having to worry about (and clean up) conflicting data from different nodes.
The description above left out an important part of the process that occurs in this situation, but it is documented: http://www.mongodb.org/display/DOCS/Replica+Sets+-+Rollbacks . Note that the data that has been rolled back is saved to a file so that it can be applied again if so desired.
As mentioned here, if you don't like that behavior, you can use write concerns, and require W=2 (or more) and wait for the writes to be replicated. Of course, there's a performance cost to doing that, but you can choose.
Yes, but from an application perspective your database will be left in an inconsistent state. The next morning the ops guys will have to call the dev guys to "repair" the database by hand.
Truth is ScalienDB runs in W=3 mode at about the speed MongoDB in W=1 mode, so this is not a trade-off that customers have to make.
Not yet. ScalienDB uses synchronous replication which works only works inside the datacenter. Across datacenters is a completely different use-case coming in 2012.
"If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow."
Serious question: are the consensus modes slower than for any other system (e.g. HBase, Cassandra), or is this just a re-statement of the fact that writing to N > 1 machines is inherently slower than writing to a single machine?
You have to be careful here, as MongoDB and Cassandra use a different model of replication.
Cassandra does not perform replication/synchronization on a per command basis between the nodes. Roughly: the client writes to multiple nodes, which are mostly independent, so assuming client bandwidth is not the bottleneck, writing to W=2 nodes will not be much slower than W=1. In practice, since Cassandra's disk storage subsystem is also fast for writes, it's overall very fast at raw writes. (As in, fastest in my benchmarks.) The trade-off is that its replication model is eventual consistency, and reads are somewhat slowish. On the other hand, their model works well in a multi-datacenter environment (along with Riak).
MongoDB uses an asynchronous replication model. What seems to happen if you specify W=2 is that the master doesn't ACK the write to the client until one of the slaves has copied it off the master. In my measurements W=2 ran at a fixed ~30 writes/sec on EC2, which means this mode may as well not be there. (This W=2 performance problem was also verified by customers looking at MongoDB.)
If you look at my company's product, ScalienDB, it uses highly optimized synchronous replication model (Paxos) and a storage engine designed for that. It's actually faster running in W=3 mode than certain other NoSQLs in W=1 mode. My bet is that this is what enterprises are going to want if they're going to use a NoSQL as a primary-copy database.
(Test for youself, all products are open-source, it'll cost you less than $20 on AWS.)
3) Your explanation doesn't make sense to me. No matter the value of W, MongoDB should make best efforts to get it to all the replicas, no? So lower W should affect availability and perhaps latency but throughput should be unaffected, given a benchmark with sufficient client threads.
My Cassandra benchmarks were performed a couple of months ago.
Turns out MongoDB doesn't scale well with number of connections due to software issues (eg. one thread per connection instead of async io). At about 500 connections Mongo starts to break on the platform we tested.
It's pretty easy to demonstrate data loss with MongoDB if you're doing replication, but it's the "normal" behaviour because MongoDB uses asynchronous replication and W=1 writes by default.
Set up a n=3 cluster and a client that writes data continously, like:
i = 0; while (true) { write(_id:i, data:i); getlastresult(); /* to make sure the client sends it */ printf(i); i++; }
Now kill the master. Another node will become the new master. Issue some more writes to make sure the logs of the new and the old master diverge. Now bring back the old master, which will become a secondary, and it will say something along the lines of "finding common oplog point", and it will discard the writes that it had that were not copied to the other nodes before it was killed.
You can verify all this by looking at the i's that were acknowledged by the old master and printed by the client. The last couple of them will be gone for good.
If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow.