Disclaimer: I'm the competition. It's pretty easy to demonstrate data loss with ...

cwestin63 · on Nov 7, 2011

As a single master system, MongoDB doesn't allow the data on different nodes to become inconsistent or go into conflict. The idea is that avoiding this prevents developers from having to worry about (and clean up) conflicting data from different nodes.

The description above left out an important part of the process that occurs in this situation, but it is documented: http://www.mongodb.org/display/DOCS/Replica+Sets+-+Rollbacks . Note that the data that has been rolled back is saved to a file so that it can be applied again if so desired.

As mentioned here, if you don't like that behavior, you can use write concerns, and require W=2 (or more) and wait for the writes to be replicated. Of course, there's a performance cost to doing that, but you can choose.

Maro · on Nov 7, 2011

Yes, but from an application perspective your database will be left in an inconsistent state. The next morning the ops guys will have to call the dev guys to "repair" the database by hand.

Truth is ScalienDB runs in W=3 mode at about the speed MongoDB in W=1 mode, so this is not a trade-off that customers have to make.

foobarbazetc · on Nov 8, 2011

Does ScalienDB support WAN replication?

Maro · on Nov 8, 2011

Not yet. ScalienDB uses synchronous replication which works only works inside the datacenter. Across datacenters is a completely different use-case coming in 2012.

foobarbazetc · on Nov 9, 2011

I understand the difference. Just wanted to know whether to spend time evaluating it or not. :)

timr · on Nov 7, 2011

"If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow."

Serious question: are the consensus modes slower than for any other system (e.g. HBase, Cassandra), or is this just a re-statement of the fact that writing to N > 1 machines is inherently slower than writing to a single machine?

Maro · on Nov 7, 2011

You have to be careful here, as MongoDB and Cassandra use a different model of replication.

Cassandra does not perform replication/synchronization on a per command basis between the nodes. Roughly: the client writes to multiple nodes, which are mostly independent, so assuming client bandwidth is not the bottleneck, writing to W=2 nodes will not be much slower than W=1. In practice, since Cassandra's disk storage subsystem is also fast for writes, it's overall very fast at raw writes. (As in, fastest in my benchmarks.) The trade-off is that its replication model is eventual consistency, and reads are somewhat slowish. On the other hand, their model works well in a multi-datacenter environment (along with Riak).

MongoDB uses an asynchronous replication model. What seems to happen if you specify W=2 is that the master doesn't ACK the write to the client until one of the slaves has copied it off the master. In my measurements W=2 ran at a fixed ~30 writes/sec on EC2, which means this mode may as well not be there. (This W=2 performance problem was also verified by customers looking at MongoDB.)

If you look at my company's product, ScalienDB, it uses highly optimized synchronous replication model (Paxos) and a storage engine designed for that. It's actually faster running in W=3 mode than certain other NoSQLs in W=1 mode. My bet is that this is what enterprises are going to want if they're going to use a NoSQL as a primary-copy database.

(Test for youself, all products are open-source, it'll cost you less than $20 on AWS.)

jbellis · on Nov 7, 2011

1) The explanation of Cassandra isn't quite correct. See http://www.datastax.com/docs/1.0/cluster_architecture/replic... for details.

2) Cassandra read performance is on par with writes now: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-...

3) Your explanation doesn't make sense to me. No matter the value of W, MongoDB should make best efforts to get it to all the replicas, no? So lower W should affect availability and perhaps latency but throughput should be unaffected, given a benchmark with sufficient client threads.

Maro · on Nov 7, 2011

My Cassandra benchmarks were performed a couple of months ago.

Turns out MongoDB doesn't scale well with number of connections due to software issues (eg. one thread per connection instead of async io). At about 500 connections Mongo starts to break on the platform we tested.