The Future of PostgreSQL

btilly · on May 5, 2010

The WAL log/hot standby you can read from in 9.0 are big. Back when I used PostgreSQL they were the biggest missing features that I wanted.

If you're doing reporting then you really care about analytic queries. That is in 8.4. But the truth is that most developers don't use their databases in ways where they will benefit. However if you run across cases where you think, "I wish I could just suck this data out, sort it this way, then do this simple processing/grouping and upload that back into the database" then you have probably run across a use case for analytic queries. With analytic queries the only cases where I've had to do that are to join data that is not in the database, to process datasets that were too big for the database server to physically handle, and once because performance really, really required it.

warriors · on May 5, 2010

you should better use an OLAP engine for that like Sql Server Analysis Service

btilly · on May 5, 2010

OLAP engine for which? The reporting needs? Or the case where I needed performance?

As for the reporting needs, introducing OLAP would have been a lot of work for something that could be done perfectly well in the existing Oracle database using a supported Oracle feature. There is no need to introduce an expensive new technology stack for an already solved problem.

On efficiency, heh. Hundreds of thousands of items had been given arbitrary tags (on average over a dozen per item, and the same tag could be given repeatedly), and the question was to identify for each item the other items which were "most closely related" based on shared tags.

The fundamental problem with doing this in any kind of a database is that the fastest access path for a given fact is a lookup in an index. Which means a binary search through a data structure to find out the page and row number the data was on, followed by parsing that page in memory to locate the row, and extract the wanted information from that row. This is absolutely the fastest form of lookup available.

The equivalent operation in C++ is access an array by index and pull an element out of a struct. As a bonus it is easy to organize your data so that most of your data accesses take place in on CPU cache.

Processing time for the whole data set dropped from a week to 5 minutes.

volomike · on May 6, 2010

I wish they had a click, click, click and boom -- easy data replication to another server. Same with just hot backups where I could setup a backup job that runs on the hour and lets me fallback to any hour in at least the last 3 days if I have an issue.