Rethink, Mongo et al are scalable (at the expense of some ACID). PG is certainly...

hoodoof · on Jan 18, 2017

What do you mean pG isn't designed for scalability? Its SMP scaling is incredibly good, close to linear.

manigandham · on Jan 18, 2017

Postgres neither scales vertically or horizontally.

v9.6 finally brought some parallel scans but is still limited in scope and far behind the commercial databases. Same with replication and failover, although there are decent 3rd party extensions to get it working.

flukus · on Jan 18, 2017

The vast majority of us probably won't ever need that scalability anyway. Most places I've seen that needed something more scalable needed to just stop being so retarded with the tools they already had.

drdaeman · on Jan 18, 2017

As for the clustering in general... Scalability may be not important, but replication and failover should be. I believe, every place that had existed long enough must've had theirs "ouch, our master DB host went down" moment.

I wasn't able to set it up properly some years ago and settled down with warm standby replica (streaming + WAL shipping) and manual failover. Automating it (and even automating the recovery when old master is back online and healthy) was certainly possible - and I got the overall idea how it should operate. But the effort required to set it up was just too big for me, so I decided it wasn't worth the hassle and settled down with "uh, if it dies, we'll get alerted and switch to the backup server manually" scenario.

Having something that goes in line of "you start a fresh server, tell it some other server address to peer with, and it does the magic (with exact guarantees and drawbacks noted in documentation)" would be really awesome. RethinkDB is just like that. PostgreSQL - at least 8.4 - wasn't, unless I've really missed something. I haven't yet checked newer versions' features in any detail, so not sure about 8.5/8.6.

manigandham · on Jan 18, 2017

> you start a fresh server, tell it some other server address to peer with, and it does the magic

This is the one thing I ask of every database software, yet we still don't really have it. 90% of problems could be solved if there was a focus on the basics like easy startup, config and clustering.

flukus · on Jan 19, 2017

SQL Server is almost that easy, but still doesn't handle schema changes well.

manigandham · on Jan 19, 2017

How so? Aerospike, ScyllaDB, Rethink, MemSQL, Redis are the only databases that get close to this.

SQL Server availability groups requires Windows Server Failover Clustering, which is not quick or easy.

takeda · on Jan 18, 2017

You can try http://repmgr.org/ for automation.

It offers easy commands to perform failover and even has an option to configure automatic one.

After reading about github issues[1], I am a bit cautious about having automatic failover though.

[1] https://github.com/blog/1261-github-availability-this-week

drdaeman · on Jan 18, 2017

Thanks for the link, I think I haven't saw this one before.

As for automation... Things can always go wrong, sure. But I wonder how many times HA and automatic failover had saved the day at GitHub so no outside observers had a faintest idea there was something failing in there.

manigandham · on Jan 18, 2017

Vertical scalability is important for everyone as it makes better use of hardware (lower costs or more performance).

Horizontal scalability is important for HA which every production environment would like or need.

anarazel · on Jan 18, 2017

I suppose the parent comment was more about scaling up for transactional workloads - the story there is a lot better (although there's still issues left we haven't tackled, but mostly on very big machines) than for analytics workloads. But yes, while we progressed in 9.6, there's still a lot of important things lacking to scale a larger fraction of analytics queries.

takeda · on Jan 18, 2017

Don't have experience with Rethink, but Mongo scalable? That's a joke right?

Here how it compares: http://www.datastax.com/wp-content/themes/datastax-2014-08/f...

The only values that has higher than rest are on page 13, except those are latency and lower is better.

This paper shows clearly that Mongo doesn't scale.

And even for single instance Mongo is slower than Postgres with JSON data: https://www.enterprisedb.com/postgres-plus-edb-blog/marc-lin...

There are actually add-ons to Postgres[1] that add MongoDB protocol compatibility, and even with that overhead Postgres is still faster.

And even such benchmarks don't tell full story. I for example worked in one company that used Mongo to regional mapping (map latitude/longitude to a zip code and map IP address to ZIP). The database on Mongo was using around 30GB disk space (and RAM, because Mongo performed badly if it couldn't fit all data in RAM), mainly because Mongo was storing data without schema and also had limited number of types and indices. For example to do IP to ZIP mapping they generated every possible IPv4 as an integer, think how feasible that would be with IPv6.

With Postgres + ip4r (extension that adds a type for IP ranges) + PostGIS (extension that adds GEO capabilities (lookup by latitude/longitude that Mongo has, but PostGIS is way more powerful) things looked dramatically different.

After putting data using correct types and applying proper indices, all of that data took only ~600MB, which could fit in RAM on smallest AWS instance (Mongo required three beefy machines with large amount of RAM). Basically Postgres showed how trivial the problem really was when you store the data properly.

[1] https://github.com/torodb/torodb