MongoDB Is Raising Another $100M

nawitus · on Jan 9, 2015

I find it unfortunate that CouchDB is pretty much a dead project even though it's fundamentals are for most projects better than MongoDB. Master-master databases are the future for most scaled architectures.

logn · on Jan 10, 2015

Cloudant developed BigCouch which is an enhanced CouchDB for clusters. Last I heard they intended to merge their fork back in for the next major rev. IBM bought Cloudant and in my experience, IBM does a pretty good job with open source (e.g., Eclipse).

Lazare · on Jan 9, 2015

While the project is, if not dead, then certainly a bit quiet, I think the protocol is still very much alive. Some very interesting stuff happening with PouchDB these days. And it's the protocol that enables the sexy master-master replication, so...not all bad news.

ahoge · on Jan 10, 2015

CouchDB just isn't very nice to use. Compared to MongoDB or RethinkDB, it really isn't any fun. You can't do ad-hoc queries and it doesn't give you the tools to properly write/test your map/reduce/rereduce functions. I still don't know how trying the rereduce step is supposed to work.

To make matters worse, the used JS engine is fairly old and the exceedingly wordy documentation is kinda hard to follow.

I'm a lot happier with RethinkDB.

brickcap · on Jan 10, 2015

Agreed about ad-hoc queries. It would have been nice to have them but couchdb-lucene[1] or even an integration with elasticsearch helps a lot if you need them.

https://github.com/rnewson/couchdb-lucene

willholley · on Jan 10, 2015

Although it's been a quiet few years, the project is on the up again. 2014 was a big year for CouchDB - CouchDB 2.0 developer preview was released (more on this in a sec) [1], IBM acquired Cloudant (and has made it available as on-premise offering), Cloudant Query (MongoDB-style querying for CouchDB)[2] was contributed to CouchDB. PouchDB, Couchbase Mobile and Cloudant Sync (all replication-compatible with CouchDB) all saw major new releases and increased uptake. Much more detail on all of this is on the CouchDB blog [3].

CouchDB 2.0 adds dynamo-style clustering support (similar to Riak and Cassandra) in addition to the replication(sync) protocol used by the CouchDB ecosystem [4]. It also includes all the fixes / performance improvements based on Cloudant's operational experience over the last 5 years.

That said, document databases are certainly a niche. CouchDB is a good choice if you need strong durability (writes are always fsync'd - to multiple copies when clustering) and consistent performance as your database scales (querying options may seem restrictive but are designed to scale very well). As others have pointed out, the RESTful interface to CouchDB makes it a good fit for web and mobile applications which can query the database directly without an app tier. PouchDB [5] and various mobile datastores which implement the sync protocol [6][7] allow you to also take your database to a browser or mobile client and work with it when disconnected which is pretty compelling.

[1] https://speakerdeck.com/wohali/putting-the-c-back-in-couchdb... [2] https://cloudant.com/blog/introducing-cloudant-query [3] http://blog.couchdb.org/2014/12/19/couchdb-weekly-news-decem... [4] http://www.replication.io/ [5] http://pouchdb.com/ [6] https://cloudant.com/cloudant-sync-resources [7] http://www.couchbase.com/nosql-databases/couchbase-mobile

jgalt212 · on Jan 10, 2015

I hear what you are saying about the value of master-master, but querying on CouchDB is rather arduous, while querying on MongoDB (if you are coming from SQL world) is quite easy.

That being said, CouchDB does have a nice built-in in gui.

conorgil145 · on Jan 10, 2015

my biggest complaints about CouchDB revolve around design documents. The documentation on which portion of a design document will trigger the indices to recalculate are awful. I had to figure it out by asking in IRC and then still doing trial/error.

Cloudant is doing some very interesting things relating to querying on a CoudhDB system. They created a query syntax so that you have an option besides map/reduce [1]. I watched their webinar on it and it seems pretty slick, but I have not yet played with it. Also, it is not part of CouchDB yet so there is no option to run tests against locally to verify syntax, response errors, etc, etc.

Cloudant is working to merge many of their changes from BigCouch back into the CouchDB project, so one day I expect CouchDB to have multiple query options [2].

[1] https://cloudant.com/blog/introducing-cloudant-query/#.VLCSc...

[2] https://cloudant.com/blog/update-from-nebraska-the-cloudant-...

brickcap · on Jan 10, 2015

Yeah it was a bit confusing for me too initially but if anyone else still does not know the answer to this it is:-

1. Indexes are built and appended to at read time.

2. If you change the view code an index rebuilding will occur. (A hash of all the views in the design document is taken and compared with the previous hash. So even a small thing such as adding a space will in effect change the view code and trigger rebuilding of views.)

3. No change to any other part of design document will have any effect on views.

I wrote in detail about it here (http://staticshin.com/programming/does-updating-a-design-doc...)

conorgil145 · on Jan 10, 2015

Great write up! Thanks for sharing

brickcap · on Jan 10, 2015

Glad you liked it :)

cmpb · on Jan 10, 2015

Genghisapp is a great drop-in GUI for Mongo btw.

otterley · on Jan 10, 2015

What can you do in CouchDB that you can't do in ElasticSearch?

brickcap · on Jan 10, 2015

@vvoyer already mentioned replication. But there is another thing that couchdb is good at but most people don't use it/ know about it. It can interpret arbitrary erlang code.

Which means that you can write an erlang module and call it with couchdb show functions and have the benefit of a ready made http api for you.

A practical use of this would be to use mnesia as an in memory key-value store(for similar things that you would use redis for) and then call the functions in the module from a show function. All the benefits that you get with erlang you get with couchdb.

One of the killer features (for me at least) is user accounts management. In 3 http calls I have login-logout facility in my app ready. I can't tell you how impressed some of my clients are when I show them a V1 of their product in a couple of days.

otterley · on Jan 10, 2015

If you configure your indexes properly, you get replication in ES for free: http://www.elasticsearch.org/guide/en/elasticsearch/guide/cu...

janl · on Jan 13, 2015

This is different from CouchDB’s peer-to-peer or master-less replication where nodes can live all over the world, be offline for arbitrary amounts of time or even be phones or web-browsers.

willholley · on Jan 10, 2015

ES isn't really meant to be used as an operational datastore (although I appreciate some do use it that way). CouchDB provides strong durability and master-master replication. It's also very easy to sync changes from CouchDB (https://github.com/elasticsearch/elasticsearch-river-couchdb) to ES so they make a good complimentary pair. Use CouchDB for the raw data (where you want strong durability and write scalability) and use ES to search over it.

vvoyer · on Jan 10, 2015

localDB.replicate.to(remoteDB); maybe http://pouchdb.com/guides/replication.html

tracker1 · on Jan 9, 2015

MongoDB does master-slave replication with an election protocol to automagically promote a slave on failure. The use of this in practice is pretty good.

It really depends on what you need... there are situations where I would recommend MongoDB, RethinkDB, ElasticSearch or Cassandra... The farther you get from traditional RDBMS, the more you have to consider your needs and any trade offs.

nawitus · on Jan 9, 2015

What I find interesting is that every single time I bring up MongoDB's lack of master-master architecture/replication on Hacker News the response I get is related to the fact that MongoDB has a master-slave architecture. Sure, it might work for some applications well, but master-master is a better choice for typical NoSQL uses.

The whole point of master-master is not dealing with failure situations or trying to scale out performance (even though master-master handles those situations naturally), the point is that it's a different philosophy. The idea is that there's no single official true state of the database. Reality happens to map that idea very well. The data doesn't exist in a centralized official place, it exists in multiple places which may not be perfectly connected in real time all the time.

The data exists at multiple servers at multiple databases. It exists on your mobile device, sometimes disconnected from the server. It exists on thousands of browsers on the same time. There's no single master.

The best mapping of this reality is the master-master ideology. Yes, it requires application-level merges of data, but the benefits you get from implementing this can be tremendous. And that's why I'm excited about using CouchDB. Or rather I'm excited about master-master databases. CouchDB happens to be one of those, and I hope future databases will be built around that too.

willholley · on Jan 10, 2015

That's fine for scaling reads - writes, not so much. The leader election in MongoDB has been problematic in the past (https://aphyr.com/posts/284-call-me-maybe-mongodb) but I think they have made recent moves to address this.

TheAceOfHearts · on Jan 10, 2015

Can you expand upon that? When would you recommend MongoDB, RethinkDB, ElasticSearch, or Cassandra?

tracker1 · on Jan 11, 2015

If you need massive reads or writes... I'm talking hundreds of thousands of simultaneous users, then Cassandra is probably your best bet, the tooling around your use will be a bit more difficult, you may need to shard/replicate portions of your data anyway depending on use. You reach for Cassandra when your use is going to get complicated. I wouldn't look at Cassandra if I needed less than say a dozen nodes to start with.

ElasticSearch is fairly close, but I'm unsure if it will scale as well in practice. ElasticSearch handles the middle-to-high ground very well. I'm not sure I would use it for authorative data... It works incredibly well for logging (with logstash) and the front end utilities (kibana, etc) are nice too. It's primary use is as a search, if you are very read heavy for searches, or write heavy for analytical data it works well. You can tune your use to separate the storage and reads in interesting ways.

MongoDB works incredibly well when your core data is composed of mostly self-contained documents and in need of certain flexibility. A typical classifieds website is a great use case. It's also a very natural fit in a lot of programming languages, js (node) in particular is a natural fit. There's much less disconnect between the data and your application models.

RethinkDB is very similar to MongoDB, but has a more traditional mindset when it comes to it's use. I think the programming interfaces are a little better thought out, consistency and data security is at the forefront here.

In general it comes down to... very large loads, use cassandra... easiest use is mongo... rethinkdb I've been waiting on the replication story to get better, and the geosearch support is fairly recent... ElasticSearch search or for logging (write heavy). In many scenarios where I would use ElasticSearch, it would be along with an RDBMS as an authoratative data source.

ensby · on Jan 13, 2015

Hi nawitus, how do you define "pretty much a dead project". I have been looking at the decline in google trend and, yes, it looks like it is dying, but the activity in the group is very much alive. And IBM Cloudant is now backing it with very nice integration to the tools of the corporate world.

ddorian43 · on Jan 9, 2015

The indexing interface sucked. I remember it's storage growed too fast(mongo also). Not webscale enough like mongodb.

NDizzle · on Jan 10, 2015

Do.... I'm trying to phrase this the right way. These are serious questions.

Do people actually consider using MongoDB for new projects? Do they want to add it to existing infrastructure? Why, with only a cursory search on the limitations?!

angrybits · on Jan 10, 2015

Unless your workload is storing deeply-hierarchical-yet loosely-related-but-otherwise-independent documents, you have a relational model. Not using a relational engine doesn't change that.

ibejoeb · on Jan 10, 2015

It's really hard to get this through to people nowadays. The relational model is powerful, and the storage systems, logic engines, languages, and ancillary toolsets that work under it--on the market, ready for production, today--are very advanced.

When I work with people who claim that their models are not relational, I usually have to contend that they are. The argument goes like this: you may be able to model your problem as documents or hierarchies, but can you model all of the questions you want to ask about that data in the same way?

The major vendors of relational systems have first-class support for hierarchical data structures, recursive data structures, graphs, KVs, and documents, and they can be used in conjunction with the basic relational features. Modern SQL is more that just SELECT...FROM...WHERE...GROUP BY; it has powerful, fast analytical functions, domain modeling, and reporting features. The top engines can partition your data and parallelize your access patterns to get the most value out of your commodity multi-core/SSD hardware.

The support for such systems is ubiquitous in todays software libraries. These systems even have tailored hardware platforms to support them if your problems really lie far out on the curve.

The downside is that none of the free/OSS systems are quite as capable. The commercial systems often require the top-tier editions to support all of the above.

The good news is that it's really a good financial deal if you actually need it. A $40,000 license for Oracle or MS-SQL is 1/4 of the annual cost of an engineer that can coerce similar functionality out of a lesser product. Their are plenty of consultants that can help you get there on a one-and-done basis.

PostgreSQL is getting there, too. Query parallelism is, for me, the biggest gap. There are some neat aftermarket solutions, but it's not quite there yet.

count · on Jan 10, 2015

While I don't disagree with you, I think you're might be missing a higher level difference: many of todays teams are pulling functionality OUT of the storage engine entirely, and 'rolling their own' in the application code.

I'm not passing judgement on if that's a good thing or not, but many teams I see today are looking at the storage engine as nothing but that: a temporary place to put things that can be swapped out if there's something that does the same job faster, where 'job' is glorified K:V and possibly sorting.

Ceding advanced functionality to the database is what is being avoided: my own app code is usually easier to troubleshoot than an obscure Oracle error.

tensor · on Jan 10, 2015

I think another part of the problem is that so many people think that you need an ORM to use a relational database. That often prevents them from using advanced functionality.

cmpb · on Jan 10, 2015

We use it for nearly everything. We use it because it's easily portable, easily configurable and extendable, and it has a very pleasant API. It's also pretty fast.

bhouston · on Jan 10, 2015

We use MongoDB for https://Clara.io. It is working for us fairly well.

angrybits · on Jan 10, 2015

Are you doing BI/analytics on it or do you unload to another stack for that?

bhouston · on Jan 10, 2015

We are running a Google Docs for 3D style application on it -- you can find examples of what people are doing here: http://clara.io/library

We use other hosted SAAS services for analytics (MMS, Google Analytics, a few others) because well they are cheap/free and that isn't our core strength.

threeseed · on Jan 10, 2015

Absolutely. If your data model is document orientated then it is an excellent choice.

And EVERY database has limitations. You just need to be pragmatic and determine if you will ever really hit them.

chris_wot · on Jan 10, 2015

Which is a completely niche market. And even those people find they hit snags - like when they discover that their data model isn't actually document oriented, like they thought.

Given this, why don't they just use Postgres?

threeseed · on Jan 10, 2015

Are you replying to the right person ? I have no idea what your point is.

If I am building an application with a document model (which actually isn't that niche for SPA sites) then MongoDB is a great choice. It is much easier to use and manage than PostgreSQL which is important if you are trying to get something off the ground. This is why I would use it. But that doesn't mean it's the right choice for everyone nor is PostgreSQL, Oracle, MySQL or any other database.

chris_wot · on Jan 10, 2015

My point is that it's so limited that for most people (and that includes single-page apps) you're better off using a relational database.

I've been watching hierarchical data models for some time now - I honestly can think of only a few, limited applications for something like MongoDB.

I'm not the only one who thinks so - see http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

Edit: Look, I know I'm being negative on MongoDB. But I really wanted it to be awesome, but it's inherent limitations are just so start that I feel that most people who use it are doing so because they are misinformed or misled. If you have found success with it, that's great. I just have very strong reservations about the entire data model for most people.

As someone has posted above: if you have data that is relational, then it's relational - you should use a RDBMS.

threeseed · on Jan 10, 2015

The relational data model is not perfect for every use case. I use Cassandra for high volume, time series data, MongoDB for documents, Titan/Neo4J for graph data, ElasticSearch when I need better indexing. I pick a different technology depending on what I think works best for the use case.

At no point have I thought so condescendingly as you do that MY choices are the best for everyone. They aren't. And I promise you that picking the one technology/approach for everything doesn't work anymore. It's a heterogeneous world out there.

And that link you posted is pathetic. MongoDB is not suited for social network style data neither is many other databases. Doesn't mean they are useless for every use case.

chris_wot · on Jan 10, 2015

Look, I know I'm being negative on MongoDB. But I really wanted it to be awesome, but it's inherent limitations are just so start that I feel that most people who use it are doing so because they are misinformed or misled. If you have found success with it, that's great. I just have very strong reservations about the entire data model for most people.

Ease up there buddy! I wasn't being condescending. I just think that for most people a relational model is probably what they are looking for. I don't think the article I posted is "pathetic", as that's a bit condescending... Just showing one datapoint that shows where people think they need a hierarchical data model, in fact they need an relational model.

aikah · on Jan 10, 2015

are you really juggling with 3 dbs in the same project?

davidgerard · on Jan 10, 2015

Same reason as MySQL: it's there, it's popular, it's the first thing that springs to mind. That it's shit doesn't factor in.

proveanegative · on Jan 10, 2015

Arguably, MySQL/MariaDB has over the course of its life improved to the point where it is a reasonable product. I normally wouldn't pick it over PostgreSQL for a new project but it doesn't easily lose data any more, either, and it has good read performance with the default backend.

davidgerard · on Jan 10, 2015

It's not an unmitigated disaster, true. But speaking as a sysadmin whose problem it is, it's still horrible to administer for something that turns out to be business-critical, and it pains me whenever a third-party useful thing pretty much requires it. (Typically PHP stuff where the paid developers hyperoptimise for MySQL, and there's hypothetical PG support which is actually half a volunteer.)

proveanegative · on Jan 10, 2015

>Typically PHP stuff where the paid developers hyperoptimise for MySQL

That's quite true. My experience with MySQL comes from working with company intranet installations that didn't see that much traffic and some web apps, all with a high read-to-write request ratio. They all ran off a single DB server (some hardware, some VPS), so my administrative work was limited to automating fairly straightforward tasks with things like Ansible. I'm curious to hear an example of the kinds of problems you've run into with MySQL, since I assume you deal with more complex and larger-scale deployments. (And I'm looking for anecdotes to help persuade customers and developers alike to give Postgres a try.)

davidgerard · on Jan 10, 2015

Nothing hugely complicated. Production Magento, Drupal and WordPress - we managed to outsource the WordPress, so now it's mainly now Magento, and that's much more horrible in itself than MySQL - one in-house tool that used to use it, some MediaWiki, and one really badly-behaved in-house tool that picked MySQL without asking us first.

Mostly we were bitten by long-running MySQL sillinesses:

* its strange idea of UTF-8 (we call the MySQL version WTF-8 - see http://geoff.greer.fm/2012/08/12/character-encoding-bugs-are... )

* InnoDB's galloping disk consumption

* the Debian/Ubuntu package's default stupid behaviour of putting all the InnoDB databases into a single file ibdata1 (I hope this is a Debianism and not something that's default in upstream)

* ibdata1 never shrinking ever (bug #1341, open since 2003)

* binary replication issues (e.g. bug #68892, which is fixed but that doesn't help older or distro versions)

* several others I've mercifully obliterated the braincells that were holding them. But all of these were long-known issues that will never be fixed for one reason or another (backward compatibility with past mistakes, or they just can't be bothered).

Here's a good crib: http://grimoire.ca/mysql/choose-something-else

tl;dr MySQL: the Comic Sans of databases. Except Comic Sans has use cases.

proveanegative · on Jan 10, 2015

Thanks for the list. I'll bookmark it just in case.

davidgerard · on Jan 11, 2015

I also made this a blog post: http://reddragdiva.dreamwidth.org/593924.html Some hopefully useful comments there, and on G+: https://plus.google.com/u/0/111502940353406919728/posts/gHwp...

nobody_nowhere · on Jan 9, 2015

So that brings the total to $331M? That's a metric ton of cash, what do they do with it?

jamesblonde · on Jan 9, 2015

Take some database courses, learn about transactions and isolation models, maybe some do some distributed systems courses, learn about agreement protocols, failure models, recovery models....

fiatmoney · on Jan 9, 2015

Some may think the OP is joking.

https://www.youtube.com/watch?v=nzjIP6O4kEo

ufo · on Jan 10, 2015

Whats the issue with that video? (I can't watch a full 27 min presentation right now)

revelation · on Jan 10, 2015

Why do they even have a Ruby driver? Isn't FFI a thing in ruby?

steveklabnik · on Jan 10, 2015

A pure Ruby driver makes it available on JRuby, which doesn't handle the FFI as well.

throwawayaway · on Jan 10, 2015

scoff all you want, that's a better explanation of threading and the dangers entailed than I got in a whole semester of my undergrad. there's an art in explaining something complex simply.

hamburglar · on Jan 16, 2015

I'm sorry, but that talk is depressing. It starts with a grand statement that there's no such thing as threadsafe ruby code, and as a demonstration of that, it shows multithreaded code written by someone who had apparently never heard of synchronization primitives. It then goes on to explain how to use synchronization primitives to make your code threadsafe. It's like watching someone discover error handling for the first time.

throwawayaway · on Jan 17, 2015

> who had apparently never heard of synchronization primitives.

i think that's the nub of our difference of opinion. i saw that as a ruse, i don't genuinely believe that the presenter never heard of them.

fapjacks · on Jan 10, 2015

Exactly. Thank you.

nkozyra · on Jan 10, 2015

OK, I laughed.

gfodor · on Jan 9, 2015

It's going to take a lot of marketing people to convince people to pay for something they can now get for free via PostgreSQL.

apetresc · on Jan 9, 2015

If that were true, there would be an order of magnitude fewer commercial database systems on the market.

x1024 · on Jan 9, 2015

It's true, and soon there won't be.

hobs · on Jan 10, 2015

Well, I might be downvoted to hell like the other poster, but from where I sit thats a possibility far out into the future.

I think you are thinking there will be an objective reckoning of features/performance/reliability and somehow since OSS wins all of those they will choose it, and as far as I am concerned, thats not actually why they choose their database in the first place.

A million b2b apps are developed on sql server because visual studio/microsoft makes that easy for their .net stack, and oracle sells to executives or other manager types and gets shoehorned into projects or set as a requirement before smart people get involved ALL THE TIME.

A lot of it still comes down to enterprise pricing, support, integration, and name recognition.

davidgerard · on Jan 10, 2015

This is true. Also, vendorware that's written to Oracle.

However, I am thankful my own bosses can count, and went "WHAT" at the last Oracle bill.

So we're actively seeking to move our own stuff from 'Orrible to PG, and to get rid of the vendorware depending on Oracle.

We just got AppDynamics in (ridiculously versatile and useful monitoring). Speaking to the AD sales engineer, he said a lot of their Oracle-using customers are eyeing up PG similarly.

hnriot · on Jan 10, 2015

an opinion that has been touted since the 90's. still just as wrong.

tracker1 · on Jan 9, 2015

PostgreSQL does not have in the box master-slave replication with automatic failover and promotion. This is probably one of the biggest issues in where I would choose MongoDB over PostgreSQL, sharding is another.

I like PostgreSQL, and plv8 looks incredibly cool... when replication and promotion are in the box (not needing enterprise or other complex addons), It'd be my first choice for most situations.

(edited comment to make it less snarky)

angrybits · on Jan 10, 2015

You don't pick your storage layer based on perceived ease of devops, you pick it based on how well it accomplishes the needs of your workload. You pick it because it fits well in your architecture. If you're worried about scale before you need to scale, you're wasting thought cycles and probably capital. Worry first about getting people to give a shit about your product, and then move on to scale when the time comes.

threeseed · on Jan 10, 2015

You pick the database based on the characteristics that are important to your business. If you're cash/resource strapped than ease of operation will be important. If you're ingesting data at high rates from day one then scalability will be important. There are plenty of cases where MongoDB is the right choice from day one and plenty of cases where it isn't.

Who are you to lecture others on what is/is not important for their needs ?

chris_wot · on Jan 10, 2015

If you are cash and resource strapped, then I would NOT recommend using MongoDB. Seriously, in terms of cash, Postgres can cost you nothing; if you are resource strapped - use Postgres, there are a heck of a lot more people who know it than MongoDB. If you can't setup replication, then IMO you have some seriously larger underlying problems with your startup.

Don't forget that a lot of people have realized that MongoDB is NOT a good fit for what they want to do. Then they have a resourcing problem - a big one.

threeseed · on Jan 10, 2015

MongoDB costs you nothing as well. And there are plenty of people who know MongoDB as well as 10gen who can provide official support if needed. It is quite a popular database you know.

And there are plenty of people who are switching away from PostgreSQL and other SQL databases to MongoDB. Again it is quite a popular database (hence the huge amounts of cash they seem to be easily raising).

gfodor · on Jan 10, 2015

psql has read only streaming slave replication with a hot standby failover, in the box.

mnutt · on Jan 10, 2015

Sure, but the base_backup process is painful enough to make it non-trivial. It's not like you can just bring up a new postgres instance, point it at an existing primary and say "catch up and start replicating".

Replication has certainly gotten much better in the past few years, though.

eknkc · on Jan 10, 2015

Yeah.. which takes 25 people and 45 days to set up /s

Mongodb literally has a 1-line command to add a replica set member and it works. This is still a huge advantage.

gfodor · on Jan 10, 2015

That's bs. With wal-e it took a few hours tops to set up a basic master-slave system from scratch if you didn't know what you're doing. I'd have to imagine it's easier now. With proper automation adding a replica should be as simple as booting a machine and running a script or two.

I don't care either way I just use RDS and don't worry about it. (You can add replicas in one click with RDS.)

threeseed · on Jan 10, 2015

I can't tell from the documentation whether wal-e supports non-cloud based synchronisation (https://github.com/wal-e/wal-e). There's no example listed there.

And if you think that is as simple as MongoDB's replica sets then frankly you are crazy.

takeda · on Jan 10, 2015

Wal-E is a backup solution, and has something that Mongo doesn't it is capable of backing up every transaction and restoring the database state at any point in time. Because of its nature it can be used for replication.

Wal-E was written for use in cloud which has its own challenges, but if you want to run in your data center the closest thing is Swift in OpenStack.

As for Mongo, my company is currently using it, and I wouldn't say it's any easier especially when you try to use it in public cloud, when you no longer have guarantees that the instance you set up won't be terminated and recreated in different AZ. There are plenty of challenges.

I'm currently working on convincing other teams to drop using Mongo and instead make queries directly to Postgres which is our authoritative source of data. The idea was to simplify our infrastructure, and I was not expecting much difference in performance but oh boy. In all of my POCs that I did so far PG is beating Mongo that makes you feel sorry for it. Both in performance (you need to understand what you're doing and use right types, indices and queries) and data size (after moving, the data is much smaller so it no longer requires being distributed, and also the instances can be much smaller).

tracker1 · on Jan 11, 2015

Mongo is perfectly capable of point in time restores... in fact Mongo corp has a SaaS that does this... I know of several who have done the same on their own. There's been plenty of work in side-channel events with Mongo's log shipping system.

Perhaps you can point to a relatively simple walkthrough to setup PostgreSQL with plv8 for replication (with an easy promotion of a slave to master), that doesn't take a commercial support license...

Everything I've seen seems incredibly convoluted and more difficult than say MS-SQL, MongoDB, RethinkDB, ElasticSearch or several other databases at data replication and spinning up new nodes, or handling a primary failure.

Also, most of my work with Mongo has performed very well, if your data is a good fit, which I will admit it isn't all a good fit. Honestly, I'd rather use pgsql with plv8 over mongo, or elasticsearch, or ms/azure-sql... The support costs and my time are important to me, and better spent working on architecture or development. If I can make an operations level decision that works well enough, or is easier to scale then the development time is almost a wash.

angrybits · on Jan 10, 2015

It's really not, the thing that makes a piece of software successful is not how many lines of bash it takes to "scale". It's how hard it is to need to scale. You're worried about the wrong thing.

craigching · on Jan 10, 2015

Replication isn't about scaling, it's about HA. And MongoDB's replication is dead simple to setup. I am waiting for PGSQL to catch up in that regard, they are working on it and making good progress.

If you need scaling in MongoDB, you're looking at replica sets combined with sharding. Not all workloads, as you rightly point out, need to scale from the get-go, but there are an awful lot that need HA.

tracker1 · on Jan 11, 2015

Thank you... This was kind of my point... I will note that if you are in a situation with very heavy read usage, then replication does give you scaling regarding that. Again, my use case was for a classifieds site, where roughly have the page views are search results.

eknkc · on Jan 10, 2015

Who said anything about scaling? I run simple low utilization services on a replica set because of high availability. Replication only provides secondary reads in terms of scaling and that's not the point. Actual scaling comes from sharding (partitioning of data).

mathattack · on Jan 9, 2015

Hire, Hire and Hire. I suspect their infrastructure costs are very high too.

One challenge that companies find as they try to scale is they have to add a lot of Sales and Marketing costs well before they receive any revenue, so there's a cost bump despite low growth in engineering. This is compounded if there's a professional services component.

c-rack · on Jan 9, 2015

They should instead fire some of their technical decision makers and use the $100M as compensation for pain and suffering for early adopters. Choosing MongoDB for a project was one of the biggest failures in my career.

rational-future · on Jan 9, 2015

What do you mean by infrastructure costs?

jacquesm · on Jan 9, 2015

I'm assuming just their test farm dwarfs most regular deployments.

ranman · on Jan 9, 2015

They have a huge test matrix. Sharding tests, performance tests, and large data tests probably take a long time.

You can see their test matrix here: https://mci.10gen.com/

and the driver tests here: https://jenkins.mongodb.com/

Also they run https://mms.mongodb.com/ which does backup and monitoring

MichaelGG · on Jan 9, 2015

Even $1M buys a lot of test equipment.

kalimoxto · on Jan 9, 2015

They have MongoMMS which has 2 datacenters and provides backup and monitoring services for Mongo clients

gasping · on Jan 9, 2015

They'll probably spend it on marketing MongoDB as a replacement to relational databases. Brainwashing isn't cheap.

angrybits · on Jan 10, 2015

I don't know why you're being downvoted, that seems to be their MO.

rational-future · on Jan 9, 2015

Some part of it certainly goes into MongoDB development. It still sucks, but every new version has a ton of major improvements. I may give it another try once 2.8 is released.

tracker1 · on Jan 9, 2015

Been using MongoDB for searches on ClassicCars.com for about 3 years with minimal issues.

Although the next generation is moving to ElasticSearch, we haven't had issues with our use of MongoDB (which is a very good use case for it).

mbesto · on Jan 10, 2015

Sales & Marketing - or as Zed Shaw likes to put it "steaks and strippers"[0]

[0] - http://vimeo.com/2723800

debacle · on Jan 10, 2015

So if Mongo is such shit and Couch is a pain in the ass to configure with not so great documentation, what can I use as a NoSQL database? From time to time, I find myself wanting a database that is flexible with document definition but every tool I've tried has kind of sucked.

jxf · on Jan 10, 2015

This sounds like heresy, but you can actually use Postgresql. It has a JSON column type that supports indexing!

http://www.postgresql.org/docs/9.4/static/datatype-json.html

ahoge · on Jan 10, 2015

Postgres also got hstore (key-value) and arrays. So, if you only need a little bit of flexibility on top, you can have that, too.

jxf · on Jan 10, 2015

`hstore` is good if you know for sure you'll never have nesting (because the values have to be primitive types). Otherwise, JSON is probably the safer, more reliable bet, though it's slightly slower.

brandonb · on Jan 10, 2015

I'm a pretty happy camper using Postgres with hstore: http://www.postgresql.org/docs/9.1/static/hstore.html

You have the flexibility to store arbitrary JSON blobs when you need to, the stability, maturity, and performance of Postgres, and the ability to migrate data to a more rigid schema once your project matures to the point where data validation is more important than raw prototyping speed.

The main drawback so far is that the query interface is a little clumsy.

jaz46 · on Jan 10, 2015

Have you tried RethinkDB?

Disclaimer: I used to work there.

siculars · on Jan 10, 2015

Take a look at Riak, http://basho.com/.

Disclosure, I have recently joined up with them as a Solutions Architect.

softdev12 · on Jan 10, 2015

I know that NoSQL has a bunch of advantages with super large datasets, but MySQL really is a solid technology and is underrated in how scalable and robust it is.

You'd be surprised how much MySQL can do and how great the documentation is.

logn · on Jan 10, 2015

I like Riak and Cassandra.

romansanchez · on Jan 10, 2015

Have you tried elasticsearch?

megaman821 · on Jan 10, 2015

What is the go to use case for a document database?

Things like ElasticSearch (search database), Neo4j (graph database), and Redis (key/value store) seemed to be used along-side a traditional RDBMS, and have specific use cases that make them superior than trying to shoe-horn the functionality into a traditional RDBMS.

proksoup · on Jan 10, 2015

Sacrifice performance for less development overhead.

Add arbitrarily complex (json) structures to a table, without a database migration.

smurfpandey · on Jan 10, 2015

Storage of JSON documents. We are using MySQL as our primary database, and mongoDB for storing JSON documents.

zachrose · on Jan 10, 2015

What do you do with the documents? Join them? Search them? Filter them? Summarize/aggregate them? All of the above? Do you have schemas for them? (Just curious!)

smurfpandey · on Jan 22, 2015

These documents are simply sent to our client apps. All the processing happens on MySQL. And since we are sending JSON, so instead of creating JSON on runtime, we store as a document and retrieve that and send to the client.

We retrieve documents using the unique id generated in mysql table.

lovemenot · on Jan 10, 2015

With the new CRO appointment, what changes to their business model are now expected in order for them to try to realise these investments? I guess they cannot do much about s/w licensing remaining at zero cost, so will they be targeting just support revenues?

lquist · on Jan 10, 2015

Maybe I'm being stupid, but a non-ACID database sounds scary to me.

quizotic · on Jan 10, 2015

Not stupid, but there are different perspectives that reduce the scariness.

With are relational model, your working set needs to be joined together from pieces, so you want ACID to ensure that everyone sees a consistent set of pieces.

But with a document model, you can 'pre-join' your working set into a single object that has everything you need. And that object doesn't have a fixed schema, so it can grow and evolve over time.

While Mongo doesn't provide ACID across documents, it does guarantee atomicity, consistency, isolation and eventual durability for SINGLE documents.

IF your application can live with a universe that consists of a single, arbitrarily complex object - then Mongo is as within epsilon of being as safe as a regular ACID transaction system.

mrinterweb · on Jan 10, 2015

That kind of depends on your needs. Since the documents are hierarchical, and can be self-contained (depending on design), you can get a write receipt from MongoDB. I think the necessity of ACID applies more to relational databases that are updating multiple related records. If you really need an ACID like transaction in MongoDB, you have to do it in your software and check the write receipts. Of course this can fail if the database server suddenly goes down, but with MongoDB you should be using replica sets anyways.

chris_wot · on Jan 10, 2015

How many people write receipt only databases?

angrybits · on Jan 10, 2015

For the people who don't need it, it's perfectly reasonable. That said, I suppose I haven't been fortunate enough to work on anything that doesn't need it.

dutchbrit · on Jan 10, 2015

How much are they currently valued at (and how fast do they spend $100M)?

scorpion032 · on Jan 10, 2015

I guess we can all surely expect to get a couple more of MongoDB mugs this year.

fapjacks · on Jan 10, 2015

So I guess we can look forward to a lot more marketing selling MongoDB as some incredible database? It would be nice if they spent that money making it less of a pain to use and less brittle.

rdtsc · on Jan 10, 2015

> So I guess we can look forward to a lot more marketing selling MongoDB as some incredible database?

I have 2 MongoDB mugs. I could use 4 more for a nice set.

fapjacks · on Jan 10, 2015

Heh, that's true. Actually (ironically) I have a MongoDB mug too, which I found orphaned on a desk in a back office at my old job.

0xFFC · on Jan 10, 2015

I dont know so much about internal architecture of database's, but I really loved how mongodb care about people with providing such great course on udacity.