Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can insert and retrieve by ID just as fast with PostgreSQL.

Map/reduce? Isn't that slow in general? Can't PostgreSQL JOIN and aggregate functions do most of what map/reduce is for but faster?



You can insert and retrieve by ID just as fast with PostreSQL.

How can postgres possible be faster on insert? Unlike Mongo, Postgres actually needs to write bytes to a disk (with a WAL!).


Even modest db work usually involves concurrency.

Mongo locks. Postgres rocks. [1]

1: http://www.postgresql.org/docs/current/static/mvcc-intro.htm...


> Unlike Mongo, Postgres actually needs to write bytes to a disk (with a WAL!).

Not with unlogged tables it does not!


Or just a single transaction with synchronous_commit!

> Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a "success" indication to the client.

> This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits.

http://www.postgresql.org/docs/9.3/static/runtime-config-wal...


And you can set fsync off for extra yolo!


Postgres also has stupid mode option. (unlogged)


The feature works per table and is off by default. Unlogged tables have another nice property: If the data gets corrupted somehow, then the table will be completely empty. This makes it clear to users that you'd use unlogged tables for caching or other data that you have no problem with losing.

It also means that you can be sure that the data you read from an unlogged table is that data that you put there. I much prefer this over the possibility of maybe reading corrupt data without any possibility of detection.

In my case, the use-case of an unlogged table is a table that practically works like a materialized view (until postgres learns to refresh them without an exclusive lock and maybe even automatically as the underlying data changes). If the table is empty, I can easily recreate the data, so I don't care about any kind of WAL logging or replication. If the table's there, perfect. I'll use it instead of a ton of joins and an equally big amount of application logic.

If the data isn't there, well. Then I run that application logic which will cause the joins to happen. So that's way, way slower, but it'll work the same.

The advantage of using an unlogged table here is that it's cheaper to update it and it doesn't use replication traffic (unlogged tables are not replicated).


I think it's been referred to as 'running with scissors' mode actually.


You can also tune the fsync option to get mongo-like performance and reliability.


It isn't turned on by default just like it was in MongoDB initially.


It is not stupid it actually makes sense for ETL workflows where you load raw data into unlogged tables and transform them in the database into the final (logged) representation.


>Unlike Mongo, Postgres actually needs to write bytes to a disk

I'm now immune to MongoDB bashing now but jesus keep up with updates. This isn't true as of maybe two years ago.


> I'm now immune to MongoDB bashing now but jesus keep up with updates.

We do want Jesus to keep up with the updates, for sure, but this isn't just about a random slip-up, goes a bit deeper than that. This wasn't an unfortunate bug, "oh we meant to do all we could to keep your data safe" it was a deliberate design decision.

Dishonesty and shadiness was the problem. When finally they started shipping with better defaults, did we see an apology? Was anyone responsible terminated? Nope. Don't remember.

Now, it could all be a statistics fluke and people randomly all decided to bash MongoDB, say, more than Basho, Cassandra, PostgreSQL, HBase and other database products? Odd isn't it. Or, isn't it more rational, that it isn't quite random, there is a reason for it.

When developers sell to developers, it is expected a certain level of honesty. If I claim I am building a credit card processing API but I deliberately add a feature that randomly double charges to gain extra performance, you should be very upset about. Yes, it is in the documentation, on page 156, that says in fine print "oh yeah enable safe transaction, should have read that". And you as a result should probably never buy or use my product and stop trusting me with credit card processing from them on.

The bottom line is, they are probably good guys to have beers with but I wouldn't trust them with storing data.


Your analogy rests on the absurd assumption that mission critical components don't need to be fully vetted and understood.

Shady? Yes. Dishonest? No. To be sure, both of these attributes suck, but beyond that there is an absolute world of difference between the two. One passively leaves knowledge gaps that even novice consumer programmers would uncover (the source is wide open after all, and speaking from personal experience, any claims of hyper performance should arouse enough suspicion to inspire some basic research), while the other asserts no knowledge gaps and actively inhibits knowledge discovery.

That said, they fell very far form the ideal, and for that they should be faulted. They're now paying the price by living in a dense fog of distaste and disgust. I don't think anyone believes it's a "statistics fluke" but it's a likely that a mob/tribal mentality plays some role (this is an industry notoriously susceptible to cargo cults) and that many individuals on the burn-MongoDB-at-the-stake bandwagon do not actually have deep--or moderate!--experience with it and competing products. Also, would you chalk it up to statistical anamoly that they were able to close funding and that many large scale systems use their product?

Isn't it possible that the company had shameful marketing practices that repel the programmer psyche, but that the product itself has a unique enough featureset (schemaless yet semi-structured querying, indexing, dead simple partitioning and replication, non-proprietary management language, simple backups, all at the cost of RAM/SSD) to make it compelling for some problem classes?


Well now you are assuming the documentation is telling the truth. The reality is, things still don't go to disk as often as you think.

http://aphyr.com/posts/284-call-me-maybe-mongodb


Unacknowledged writes are disturbing but this can happen in any system: data hits the disk, and just before the server can send an ack to the client, it crashes.

That same blog notes Postgres is susceptible to the same issue: http://aphyr.com/posts/282-call-me-maybe-postgres


When map/reduce comes into play, speed isn't usually the main issue. It's a way to do work in parallel - which may be done to increase speed, but more often than not it's done because there's too much work to be done any other way.

Put another way, I see map/reduce as a background task - building indexes, reporting, etc. I don't think speed is as important there as it is for a foreground task like GET, PUT and DELETE.

For these operations, especially by key, Postgres has always been right up there - what's happening now is that Postgres is getting really good at NoSQL features - documents, key value pairs, deep querying etc - good enough that if you have Postgres you can do pretty much whatever you could have done with Mongo - while still making use of everything else it has to offer.


I don't think Mongo map-reduce is parallel for a given instance.


Mongo clusters distribute map reduce so it can be reasonably fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: