Distributed locking with Postgres advisory locks

whitexn--g28h · on Jan 24, 2022

Seems like this problem would be solved by a serializable transaction isolation level without the need for any locking. If during the transaction the select would return a different result the transaction will fail and can be retried. A select sum() is part of the example for serializable transaction isolation. https://www.postgresql.org/docs/current/transaction-iso.html...

jasonhansel · on Jan 24, 2022

Exactly. IMHO Postgres (and other SQL DBs) should consider making SERIALIZABLE the default (as is specified in ANSI SQL). Otherwise people try to be clever.

stubish · on Jan 24, 2022

A program using SERIALIZABLE transaction isolation is required to handle the inevitable errors on commit and retry the transaction or roll back the operation. In practice, developers rarely do and you end up with webapps spitting out 50x errors under load and people blaming the database because they didn't read the fine print. Which is why some projects like psycopg which did default to SERIALIZABLE transaction isolation ended up switching to READ COMMITTED as a more sensible default. It is rare that people actually need SERIALIZABLE and they rarely want to pay the performance or development costs.

skyde · on Jan 24, 2022

You say it is rare that people need serializable. But I would say majority of code that handle money need to use explicit lock or serializable otherwise they can easily be exploited by hacker to get thing for free.

lucian1900 · on Jan 24, 2022

Not only is that slow, but it raises errors programs aren’t currently handing. It would be incompatible to change.

belter · on Jan 24, 2022

As opposed to the current default, that also raises errors programmers aren’t currently handing :-)

lucian1900 · on Jan 24, 2022

The current default has effect anomalies, but doesn't raise serialization errors.

It's possible to write software that relies on READ COMMITTED and behaves correctly.

hgdfadsfwer · on Jan 24, 2022

ANSI default is READ COMMITTED I believe.

sudhirj · on Jan 24, 2022

Just a reminder to watch out for connection pooling problems when using advisory locks. If you've checked out a connection from your pool that you use from the time you take the lock to the time you release it, you'll be fine. If you operate on the pool itself, or checkout a connection from the pool each time send a query, there's no guarantee you're running subsequent queries on the same connection that you took the lock out on, or that you're releasing the lock you took (if you wind up releasing on a different connection).

I wrote up more details on https://sudhir.io/understanding-connections-pools

remram · on Jan 24, 2022

Those are not distributed. Implementing distributed X by adding a central X is a bit dishonest.

macintux · on Jan 24, 2022

“Locking distributed systems with” would seem a more accurate title than “Distributed locking with”, but I don’t know that I’d go so far as to call it dishonest, even a bit.

mianos · on Jan 24, 2022

I could not have said this better. This is not distributed in the modern usage. That said, I have used this in production and it works well up to a very respectable scale if you don't mind the central nature and already have a connection to postgres.

I would use Hashicorp Consul for this these days.

3pt14159 · on Jan 24, 2022

> If pg_advisory_lock fails to acquire a lock, an error will be returned to the client.

Well, I think it just blocks until the lock is released. I tested locally and checked the docs[0] to confirm. In any event, this is a cool mechanism to be aware of. If you ever need pg_advisory_xact_lock and you find yourself wanting to unlock it mid-transaction and are scratching your head just introduce a nested transaction :)

[0] https://www.postgresql.org/docs/12/functions-admin.html

> pg_advisory_lock locks an application-defined resource, which can be identified either by a single 64-bit key value or two 32-bit key values (note that these two key spaces do not overlap). If another session already holds a lock on the same resource identifier, this function will wait until the resource becomes available. The lock is exclusive. Multiple lock requests stack, so that if the same resource is locked three times it must then be unlocked three times to be released for other sessions' use.

anentropic · on Jan 24, 2022

> it just blocks until the lock is released

normal Postgres locks have a https://postgresqlco.nf/doc/en/param/lock_timeout/ which is max time a client will wait to acquire lock before returning an error instead

SELECT FOR UPDATE queries also accept NOWAIT keyword to return error immediately if locked

I would guess the former applies to advisory locks too. I think NOWAIT only applies to FOR UPDATE / FOR SHARE queries

But it looks like there is try_advisory_lock method https://www.postgresql.org/docs/current/functions-admin.html... that does the equivalent

3pt14159 · on Jan 24, 2022

Right. Well, I guess I just think the article is worded confusingly then. Because when I read it I thought "wait, it doesn't error, it waits" and I think that's an important feature.

anentropic · on Jan 25, 2022

Yes you can choose, either to wait until your configured timeout (and if you set it to 0 it will wait indefinitely) or error immediately

natmaka · on Jan 25, 2022

There is an equivalent set of non-blocking ("try") functions, their names prefixes are "pg_try_advisory".

https://www.postgresql.org/docs/14/functions-admin.html#FUNC...

hn_throwaway_99 · on Jan 24, 2022

I thought this was a really helpful post, as I wasn't previously familiar with the 'pg_advisory_lock', etc. functions.

That said, I'm curious if the example given is just a simplified example, because to me this absolutely seems like a case where I wouldn't use a custom locking solution, but a case where I would use basic DB transaction guarantees to ensure that I didn't insert a row without the balance being sufficient, e.g.:

  INSERT INTO ledger (tx_type, credit_holder_id, credit_delta)
  SELECT 'use', 'foo', -10
  FROM (SELECT sum(credit_delta) as balance
        FROM ledger WHERE credit_holder_id = 'foo') foo_balance
  WHERE foo_balance.balance > 10;

I'd need to double check the transaction isolation level that you'd need in PG, but generally PG is really good about efficiently supporting repeatable read isolation level.

andreareina · on Jan 24, 2022

Advisory locks can be taken at the session (i.e. outside/across transactions) level, and need not be tied to a database object (if you need to operate on multiple objects, which one are you going to lock?).

Simpler cases are supported by e.g. SELECT FOR UPDATE

JoiDegn · on Jan 24, 2022

Im not sure why the author excludes row level locking.

    SELECT ... FOR UPDATE

Should work here. Lock the table when the balance is confirmed, then append to the ledger, then commit the transaction. All following transactions then need to wait for the FOR UPDATE lock to be released before they can read from the ledger.

daenney · on Jan 24, 2022

I think you’re confusing row and table in your comment. Rows are never changed, so I’m not sure how they would use a row level lock.

They have 1000s of concurrent transactions on that table for different credit holders. Locking the whole table every time means everyone is stalled, even though you only want transactions for a single credit holder to be serialised, not every single mutation of that table.

reese_john · on Jan 24, 2022

I thought the same thing as the parent post, assuming you have a column "account_id", can't you just do:

  select * 
    from transactions 
  where 
  account_id=$user_account_id 
  for update;

Wouldn't all rows for the particular user be locked ? Then you can check the balance, append, commit and release the lock

throwdbaaway · on Jan 24, 2022

I think we can lock the row in the accounts table instead, e.g.

    BEGIN;
    SELECT * FROM accounts WHERE id = $account_id FOR UPDATE;
    -- select/insert/update/delete transactions
    COMMIT;

This essentially serializes all reads/writes to data that belongs to the account. Not sure if this performs better or worse than using an advisory lock.

reese_john · on Jan 24, 2022

Yes, though I'm not convinced that the whole process should be synchronous, unless it is a "hard" use case, such as an ATM withdrawal.

For instance, you may want to do some sanity checks, to see if the account is in a valid status, i.e "ACTIVE" and not "BLOCKED". You'll also probably want to create a corresponding credit entry to match the newly created debit. I think you are better off optimistically creating a new transaction with a "STARTING" status and then settling it asynchronous (but quickly), rolling it back to an invalid state if something goes wrong, or to "COMPLETED" if everything clears.

Nullabillity · on Jan 24, 2022

Technically that wouldn't prevent new INSERTs from happening between the SELECT and the COMMIT (since the locks only get applied to existing rows, they don't prevent new rows that match the condition), although the snippet should at least prevent racing with itself.

daenney · on Jan 24, 2022

Interesting. I'm not a 100% sure how this works, but I suppose that could do it.

DylanSp · on Jan 24, 2022

SQL Server has similar capabilities with sp_getapplock/sp_releaseapplock, though IIRC it doesn't work well in a distributed environment. I believe when I tested it on an AWS RDS multi-AZ database, the locks got released by a failover from the primary to a read replica.

dastbe · on Jan 24, 2022

advisory locks sound interesting, but this really sounds like a problem that the author is forcing on themselves. i get the benefit of having your database persist the immutable ledger, but it seems like you would have a much more scalable solution if you maintained both the materialization and the ledger in the database. then you could condition the transaction on the materialized view state and update the view/append at the same time.

if you didn't want to do this, i would also assume you could do this within a transaction w/o the lock using a serializable transaction. though no idea how indexes interplay with postgres and serializibility.

quelltext · on Jan 24, 2022

I guess for the particular use case the simple string hash works. In the worst case you lock a few more likely unrelated users.

However, in some cases you want to enact a specific locking order on some objects that interact, and in particular if locks aren't reentrant, I'd be cautious and at least use a cryptographic hash here to reduce likelihood if collisions.

Animats · on Jan 24, 2022

I've done that with MySQL. You can use an in-memory table to hold the locks. On a restart, all locks are cleared.

nam2 · on Jan 24, 2022

I don't believe "distributed locking" is a thing.

Since the lock is centralised, the use case could be either redesigned to also be centralised, or could be switched to use a different distributed algorithm.

pech0rin · on Jan 24, 2022

or create a separate table which contains current ledger number for each account and then use a simple select for update on that table so you know thats the only connection updating for that account at a time