Seems like this problem would be solved by a serializable transaction isolation level without the need for any locking. If during the transaction the select would return a different result the transaction will fail and can be retried. A select sum() is part of the example for serializable transaction isolation. https://www.postgresql.org/docs/current/transaction-iso.html...
Exactly. IMHO Postgres (and other SQL DBs) should consider making SERIALIZABLE the default (as is specified in ANSI SQL). Otherwise people try to be clever.
A program using SERIALIZABLE transaction isolation is required to handle the inevitable errors on commit and retry the transaction or roll back the operation. In practice, developers rarely do and you end up with webapps spitting out 50x errors under load and people blaming the database because they didn't read the fine print. Which is why some projects like psycopg which did default to SERIALIZABLE transaction isolation ended up switching to READ COMMITTED as a more sensible default. It is rare that people actually need SERIALIZABLE and they rarely want to pay the performance or development costs.
You say it is rare that people need serializable.
But I would say majority of code that handle money need to use explicit lock or serializable otherwise they can easily be exploited by hacker to get thing for free.
Just a reminder to watch out for connection pooling problems when using advisory locks. If you've checked out a connection from your pool that you use from the time you take the lock to the time you release it, you'll be fine. If you operate on the pool itself, or checkout a connection from the pool each time send a query, there's no guarantee you're running subsequent queries on the same connection that you took the lock out on, or that you're releasing the lock you took (if you wind up releasing on a different connection).
“Locking distributed systems with” would seem a more accurate title than “Distributed locking with”, but I don’t know that I’d go so far as to call it dishonest, even a bit.
I could not have said this better. This is not distributed in the modern usage.
That said, I have used this in production and it works well up to a very respectable scale if you don't mind the central nature and already have a connection to postgres.
> If pg_advisory_lock fails to acquire a lock, an error will be returned to the client.
Well, I think it just blocks until the lock is released. I tested locally and checked the docs[0] to confirm. In any event, this is a cool mechanism to be aware of. If you ever need pg_advisory_xact_lock and you find yourself wanting to unlock it mid-transaction and are scratching your head just introduce a nested transaction :)
> pg_advisory_lock locks an application-defined resource, which can be identified either by a single 64-bit key value or two 32-bit key values (note that these two key spaces do not overlap). If another session already holds a lock on the same resource identifier, this function will wait until the resource becomes available. The lock is exclusive. Multiple lock requests stack, so that if the same resource is locked three times it must then be unlocked three times to be released for other sessions' use.
Right. Well, I guess I just think the article is worded confusingly then. Because when I read it I thought "wait, it doesn't error, it waits" and I think that's an important feature.
I thought this was a really helpful post, as I wasn't previously familiar with the 'pg_advisory_lock', etc. functions.
That said, I'm curious if the example given is just a simplified example, because to me this absolutely seems like a case where I wouldn't use a custom locking solution, but a case where I would use basic DB transaction guarantees to ensure that I didn't insert a row without the balance being sufficient, e.g.:
INSERT INTO ledger (tx_type, credit_holder_id, credit_delta)
SELECT 'use', 'foo', -10
FROM (SELECT sum(credit_delta) as balance
FROM ledger WHERE credit_holder_id = 'foo') foo_balance
WHERE foo_balance.balance > 10;
I'd need to double check the transaction isolation level that you'd need in PG, but generally PG is really good about efficiently supporting repeatable read isolation level.
Advisory locks can be taken at the session (i.e. outside/across transactions) level, and need not be tied to a database object (if you need to operate on multiple objects, which one are you going to lock?).
Simpler cases are supported by e.g. SELECT FOR UPDATE
Im not sure why the author excludes row level locking.
SELECT ... FOR UPDATE
Should work here. Lock the table when the balance is confirmed, then append to the ledger, then commit the transaction. All following transactions then need to wait for the FOR UPDATE lock to be released before they can read from the ledger.
I think you’re confusing row and table in your comment. Rows are never changed, so I’m not sure how they would use a row level lock.
They have 1000s of concurrent transactions on that table for different credit holders. Locking the whole table every time means everyone is stalled, even though you only want transactions for a single credit holder to be serialised, not every single mutation of that table.
I think we can lock the row in the accounts table instead, e.g.
BEGIN;
SELECT * FROM accounts WHERE id = $account_id FOR UPDATE;
-- select/insert/update/delete transactions
COMMIT;
This essentially serializes all reads/writes to data that belongs to the account. Not sure if this performs better or worse than using an advisory lock.
Yes, though I'm not convinced that the whole process should be synchronous, unless it is a "hard" use case, such as an ATM withdrawal.
For instance, you may want to do some sanity checks, to see if the account is in a valid
status, i.e "ACTIVE" and not "BLOCKED". You'll also probably want to create a corresponding credit entry to match the newly created debit. I think you are better off optimistically creating a new transaction with a "STARTING" status and then settling it asynchronous (but quickly), rolling it back to an invalid state if something goes wrong, or to "COMPLETED" if everything clears.
Technically that wouldn't prevent new INSERTs from happening between the SELECT and the COMMIT (since the locks only get applied to existing rows, they don't prevent new rows that match the condition), although the snippet should at least prevent racing with itself.
SQL Server has similar capabilities with sp_getapplock/sp_releaseapplock, though IIRC it doesn't work well in a distributed environment. I believe when I tested it on an AWS RDS multi-AZ database, the locks got released by a failover from the primary to a read replica.
advisory locks sound interesting, but this really sounds like a problem that the author is forcing on themselves. i get the benefit of having your database persist the immutable ledger, but it seems like you would have a much more scalable solution if you maintained both the materialization and the ledger in the database. then you could condition the transaction on the materialized view state and update the view/append at the same time.
if you didn't want to do this, i would also assume you could do this within a transaction w/o the lock using a serializable transaction. though no idea how indexes interplay with postgres and serializibility.
I guess for the particular use case the simple string hash works. In the worst case you lock a few more likely unrelated users.
However, in some cases you want to enact a specific locking order on some objects that interact, and in particular if locks aren't reentrant, I'd be cautious and at least use a cryptographic hash here to reduce likelihood if collisions.
Since the lock is centralised, the use case could be either redesigned to also be centralised, or could be switched to use a different distributed algorithm.
or create a separate table which contains current ledger number for each account and then use a simple select for update on that table so you know thats the only connection updating for that account at a time