PostgreSQL Locking Revealed

danmaz74 · on Nov 8, 2015

Tangential: does anybody know how dependable is pg_repack[1]? I just discovered it and the promise is great - being able to do a VACUUM FULL, and even move tables around, without locking them and thus without downtime - but I'm not sure what are the risks in using it in production.

[1] http://reorg.github.io/pg_repack/

cldellow · on Nov 9, 2015

We use it on medium size tables (around ~100M rows and ~100GB each). No problems. The perf boost we get from clustering rows in the right spot on the disk is huge.

e12e · on Nov 9, 2015

Out of curiosity, as you apparently have some real data and real use-cases - have you benchmarked SSD vs disk for your use-case? In particular, do you still see a (comparable, or not) speed-up from better clustering of rows on SSD? (I would guess not, unless you have a lot of fragmentation at less than SSD write-size (typically 4k?) - and AFAIK postgres should already be trying to fix that for you, by how it lays out data on disk. But SSD do have somewhat better sequential read than random-read, even if not as dramatic as spinning disks).

cldellow · on Nov 9, 2015

There's still a speedup, but it's dwarfed by the magnetic->SSD boost.

We do have fragmentation within a block. Our records on the heap are about 150 bytes, so maybe 30 rows per 4K page. In the pathological case where there is 1 row per page due to fragmentation, queries have to fetch 30x the number of blocks, which is a killer on a spinny disk.

codinghorror · on Nov 9, 2015

People still use spinning rust for databases?

anp · on Nov 9, 2015

I have to deal with a couple of plattered databases. It's not optimal, but there's usually some cost constraint, or there are only 100 users on it a week. Or both, in which case it's no big deal and it's very very cheap.

RyanGWU82 · on Nov 9, 2015

We've used it occasionally on a ~400GB table and it's worked perfectly for us too. We run it on EC2 i2.8xlarge instances and it finishes in under 2 hours. But it doesn't use very much CPU or I/O, so we could probably run it just as well on a smaller instance like an i2.2xlarge.

One downside is that you need enough free space on your disk to hold the packed copy of the table, so you need to run it before your disk gets too full. The last couple of times we've waited too long, so we've had to move the data to a larger host before we could repack it.

smilliken · on Nov 8, 2015

Yeah, pg_repack works great. I've never had an issue with it.

nierman · on Nov 9, 2015

I'd like to make more use of it but it can't "repack" tables that have gist indices (e.g., postgis spatial indices, etc).

andrewchambers · on Nov 8, 2015

I love Datomic because of the way it removes the need to understand these locks completely. With datomic you can reason far more easily about a set of concurrent updates than dealing with SQL locking levels.

The trade is you are essentially always running with your database as fully serialized writes.

pkolaczk · on Nov 9, 2015

Concurrent updates get interesting when they are based on reads that happened earlier. Then applying updates atomically doesn't buy you anything - you still need to either lock the object so that noone modifies its state between the read and the update (pessimistic locking), or be prepared to abort an update if a concurrent change to the object is detected (optimistic locking), or to append the update instead of overwriting previous state and eventually resolve / fix later, which is not always possible and may be just as hard as getting the locking right. Additionally, being too optimistic about shared resources management may lead to really bad user experience - like being kicked out at the gate because the flight is overbooked or like being fined for overdrawing your debit-card by offline transactions. You just need to pick your poison ;)

Animats · on Nov 9, 2015

"The trade is you are essentially always running with your database as fully serialized writes."

That's a high price to pay. It's easy to do updates serially, and complex to do them concurrently.

akurilin · on Nov 9, 2015

Can that be achieved by setting all writes in postgres to isolation level serializable?

andrewchambers · on Nov 9, 2015

Not in the same way. Because the datomic database is immutable queries can take as long as they want and the results will never be inconsistent.

lucian1900 · on Nov 10, 2015

Postgres transactions can also take as long as they want and the results will never be inconsistent.

The downside to SERIALIZABLE is that conflicts become visible failures to the client, so you have to handle them and retry. Datomic solves this by serialising through the function you pass, but at great latency cost.

akurilin · on Nov 10, 2015

No free lunch.

akurilin · on Nov 8, 2015

These locks can be non-trivial to reason about at times. On 9.3 I've been bitten numerous times by concurrent transactions attempting to escalate share locks to exclusive lock because of a FK and getting stuck forever, without the db being able to break up that deadlock. If you intentionally limit the # of connections to the db (which you should), that can lead to downtime. Needed to learn about SELECT FOR UPDATE pretty quickly after that. It seems like that's somewhat ameliorated in 9.4+, although it's still possible in a few corner cases, if I recall correctly from talking to RhodiumToad.

atombender · on Nov 9, 2015

I believe that if you set the foreign key constraint as deferred, the lock won't be attempted until commit time (and then only briefly), which should help a lot -- assuming your problem is with long-running transactions that hold locks while they are open.

Animats · on Nov 9, 2015

I notice they mention advisory locks, which aren't used much. MySQL has them, too. They're useful for such things as preventing two copies of the same application from running. This works across machine boundaries, and if you lose the database connection or the client machine or the host machine, the lock is released, so you don't have all that nonsense about Linux/UNIX lock files hanging around.

I use those for tasks that should only have one copy running on a cluster, but any machine in the cluster can run.

postila · on Nov 9, 2015

There is also interesting new module pg_stat_wait that enables LWLocks monitoring (as well as other types of locks), it is already used in mail.yandex.ru (the 2nd largest email service in Russia). However the module requires Postgres be patched https://github.com/postgrespro/postgres/tree/waits_monitorin... Related discussion in -hackers: http://www.postgresql.org/message-id/flat/3F71DA37-A17B-4961...

kngl · on Nov 8, 2015

There is also pgrowlocks[1] that can be usefull.

[1] http://www.postgresql.org/docs/current/static/pgrowlocks.htm...

nierman · on Nov 9, 2015

note: there are also "skip locked" and "nowait" clauses in Postgres 9.5 for more flexibility when dealing with row level locking. This gives you the option of failing immediately (nowait) or just getting back the set of rows that aren't locked (skip locked).

https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9....

DrScump · on Nov 9, 2015

I could swear that I saw Stonebreaker comment on this very subject in a talk from the past year or so (available on Youtube).