Hacker News new | past | comments | ask | show | jobs | submit login

You can do online clustering with pg_repack (https://github.com/reorg/pg_repack).

Given a table T, it will:

1. Create a new empty table T'.

2. Add a trigger on T that will clone any operations to T'.

3. Copy all rows from T to T' in clustering order.

4. Atomically switch the tables so that T' becomes T.

Unlike CLUSTER, the only lock required is in step 4, when it needs an exclusive lock on the table; it cannot do the switch until all current transactions are done with T (you can specify a timeout). The only other drawback is that that it does not support TRUNCATE, CREATE INDEX, or some ALTER TABLE operations being run while it's busy in step 3.

Also, unlike PostgreSQL's own CLUSTER command, it does not need an index to order by.

We run pg_repack regularly at night to keep our databases clustered. It works really well, though I really wish Postgres could get native online clustering. SQL Server has had automatic clustering for many years, as has DB2, and Oracle has index-ordered tables.




> though I really wish Postgres could get native online clustering.

Tell the community - pg_reorg could be cleaned up, improved, and merged into core... and should be!


I agree, although by native clustering I meant that the Postgres engine itself should support it, much like incremental vacuum was eventually built in.

pg_repack is a fork of pg_reorg (which has not been maintained since 2011), by the way, which has become the de facto replacement. No idea if the code is in need of cleanup or not, though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: