Hacker News new | past | comments | ask | show | jobs | submit login

Iirc digitalocean API uses offsets. We had to write special code to handle the possibility of the list changing while it was being queried.



When I worked for a news site, we moved all listing endpoints from offset to timestamp based pagination because offset based pagination assumes the data being queried will remain stable between requests. This was, of course, very rarely true during business hours.

Duplicates or missed entries in the result set are the most likely outcome of offset-based pagination.


Do any of these avoid that scenario? Unless you are paginating on `updated_at ASC` there is an edge case of missing new data on previously requested pages.


Missing data that began existing after you started querying is usually OK. If you had requested the data 1 second earlier, it wouldn't have been returned anyway. With offset pagination, you can miss data that always existed. This can make you believe the item that previously existed had been deleted.

Be very careful with timestamp or auto-increment id pagination too. These don't necessarily become visible in the same order since the id or timestamp is generated before the transaction commits unless your database has some specific way of ensuring otherwise.


What do you use then that has the same order as rows becoming visible?

We use an auto-increment id, and lock inserts on the related account (which always limits the scope of the query).

The only other (stateless) way I can think of is to somehow fiddle with transaction numbers linked to commit order.


Sorry for replying very late. I've used a similar technique of locking a "parent" row while adding items to a corresponding set. It works great as long as you can figure out a relationship like that and it's fine-grained enough that you are OK with the locking.

In traditional databases, the number linked to the commit order is usually the LSN (Log Sequence Number), which is an offset into the transaction log. Unfortunately, you can't figure that out until your transaction commits, so you can't use it during the transaction.

A hypothetical database where you could see your own LSN from within a transaction would require transaction commit order to be pre-determined at that point. An unrelated transaction with a lower LSN would block your transaction from committing.

In non-traditional databases, this could work differently. E.g. in kafka you can see your partition offsets during a transaction and messages in that partition will become visible in offset-order. The tradeoff is that this order doesn't correspond to global transaction commit order and readers will block waiting for uncommitted transactions (and all the other things about kafka too).


Exactly.


Something always feels off about repopulating lists that people are working on. Like that concept needs an entirely different display method.


It’s what happens with stateless protocols and offset based pagination 100% of the time, but most people don’t notice it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: