You should never design an API that uses offsets for pagination unless you're dealing with small amounts of data (<10000 rows). Cursors give you far more flexibility and if you want to be lazy, you can just hide an offset in an opaque cursor blob and upgrade later on.
I don't think I've used offsets in APIs for at least 10 years. Lightly-obfuscated cursor tokens are one of the first things I build in web projects and that's usually less than an hour's work.
If you _really_ need the ability to drop the needle in your dataset with pagination, design your system to use pseudo-pagination where you approximate page-to-record mappings and generate cursors to continue forward or backward from that point.
When I worked for a news site, we moved all listing endpoints from offset to timestamp based pagination because offset based pagination assumes the data being queried will remain stable between requests. This was, of course, very rarely true during business hours.
Duplicates or missed entries in the result set are the most likely outcome of offset-based pagination.
Do any of these avoid that scenario? Unless you are paginating on `updated_at ASC` there is an edge case of missing new data on previously requested pages.
Missing data that began existing after you started querying is usually OK. If you had requested the data 1 second earlier, it wouldn't have been returned anyway. With offset pagination, you can miss data that always existed. This can make you believe the item that previously existed had been deleted.
Be very careful with timestamp or auto-increment id pagination too. These don't necessarily become visible in the same order since the id or timestamp is generated before the transaction commits unless your database has some specific way of ensuring otherwise.
Sorry for replying very late. I've used a similar technique of locking a "parent" row while adding items to a corresponding set. It works great as long as you can figure out a relationship like that and it's fine-grained enough that you are OK with the locking.
In traditional databases, the number linked to the commit order is usually the LSN (Log Sequence Number), which is an offset into the transaction log. Unfortunately, you can't figure that out until your transaction commits, so you can't use it during the transaction.
A hypothetical database where you could see your own LSN from within a transaction would require transaction commit order to be pre-determined at that point. An unrelated transaction with a lower LSN would block your transaction from committing.
In non-traditional databases, this could work differently. E.g. in kafka you can see your partition offsets during a transaction and messages in that partition will become visible in offset-order. The tradeoff is that this order doesn't correspond to global transaction commit order and readers will block waiting for uncommitted transactions (and all the other things about kafka too).
You don't, but if your cursor is just a position in some well-defined order, you can sometimes craft a cursor out of a known position. Continuing the book metaphor, instead of skipping to page 100, which has no semantic meaning, you could skip to the beginning of the "M" section of the dictionary, or see the first 10 words after "merchant".
You generally don’t, unless you want to linearly scan through n pages. If your API is using offset or page numbers and the underlying collection is also having items added or removed, the behavior is undefined or straight up wrong. I think it’s okay to use offsets or page numbers in cases where the collection is static or where it’s acceptable to occasionally skip over an item or see a duplicate item while paginating.
I see three options if this is a necessary use-case against a shared-changeable dataset:
1. Accept that the results will be unstable as the underlaying set changes. Pagination may either miss or double-include items unpredictably.
2. Store an intermediate result set guaranteed to remain stable for the necessary duration. This will provide stable pagination, at the cost of solving cache-expiry problems.
3. Use or build a version-controlled data store. I don't know of anything in common use, but there is likely something available. This is similar to #2, but moves the work from the application into the data-storage layer. You then paginate against a set version of the data. Imagine something similar to Immutable, but with expiry for unreachable nodes.
I don't disagree, but everyone would prefer to have an API that worked over one that doesn't, or one that causes service outages due to database latency. (did anyone ask the users if they wanted the api to work?)
If you're dealing with very small datasets, its fine. I'm an average person using average APIs, which means that when I see offset-based pagination, it's usually on a service deployed and used by a lot of people.
Unsurprisingly, the offset based APIs often include some other arbitrary limit like "offset limited to 10k" or something silly but understandable if you've built an API used by thousands of people before, or understand how databases work.
They're also often superseded by betters APIs that actually allow you to page the entire result set. Then you have a deprecated API that you either support forever or annoy users by turning it off.
So yes, if you are building something non-internal/pet project, limit/offset is probably the mark of the novice.
edit: I just saw another comment of yours, so I see this was meant more contextually than I realized.
Can you explain why offsets would never be a suitable solution? Is there a clear explanation as to why?
I understand how cursors are superior in some cases. But what if you have a site which paginates static data? Offsets would allow you to cache the results easily, overcoming any performance concerns (which would be irrelevant if the dataset was small anyway), providing a better user experience (and developer experience due to simpler implementation details) overall.
I can see that it would be a novice move if you’ve got staggering amounts of records that take a while to query. But that’s actually pretty rare in my experience.
It doesn't even have to be a cursor, you can technically page off of any field you can index (i've written a lot of sync APIs so there are gotchas, but the point stands).
Limit/offset is usually (though not always, as you point out) egregious for anything more than hobbies or small projects. But, I am biased as when I build APIs, I definitely expect some amount of traffic/volume/tuple count, and offset/limit will not do.
I don't think I've used offsets in APIs for at least 10 years. Lightly-obfuscated cursor tokens are one of the first things I build in web projects and that's usually less than an hour's work.
If you _really_ need the ability to drop the needle in your dataset with pagination, design your system to use pseudo-pagination where you approximate page-to-record mappings and generate cursors to continue forward or backward from that point.