So I'm coming at this from a Postgres POV, IIRC MySQL can take some shortcuts for COUNT(*). Constraining the query will help but counting is expensive as a sequential table scan is always required.
Meanwhile I can only think of one use case for pagination with exact row counts – a user facing application. If you're preallocating storage you can get by with an estimate. If you're consuming an API and want something from near the end you can flip the sort order.
It's slow and expensive in MySQL too, assuming you mean InnoDB storage engine. Recent versions of MySQL can parallelize a COUNT(*), but that only helps so much.
The "shortcuts" you're thinking of may be from the MyISAM storage engine days. Since MyISAM doesn't have transactions or MVCC, it can maintain a row count for the table and just return that. But MyISAM is generally not used for the past decade (or more), as it just isn't really suitable for storing data that you care about.
Storage engines with transactions cannot just store a row count per table because the accurate count always depends on your transaction's isolated snapshot.
Not necessarily in postgres. If an index is available it may use that, though it will still have to refer to the visibility map to see if any row is available to the current transaction (cheap check), and if it can't determine from that, then and only then does it need to also refer to the heap / actual table (slower check).
Is that a recent change? Back in 2016 the Citus guys claimed that:
There is no single universal row count that the database could cache, so it must
scan through all rows counting how many are visible. Performance for an exact
count grows linearly with table size.
Apologies, I should've clarified. I just meant postgres doesn't necessarily have to scan the actual table to get the count in some cases if an index is available, not that it caches the actual count.
So that's not the greatest quote but the article shows postgres having to fall back on a sequential scan for a naive COUNT(*). The author goes on to mention that as of 9.2 Postgres can do an index scan (in the context of SELECT DISTINCT):
But now we come to a quirk, SELECT COUNT(DISTINCT n) FROM items will not use
the index even though SELECT DISTINCT n does. As many blog posts mention (“one
weird trick to make postgres 50x faster!”) you can guide the planner by rewriting
count distinct as the count of a subquery
To me it seems like you can trick the query planner into doing an index scan sometimes, but that it's a bit brittle. Has this improved much since?
It says that index only scans can be used with predicates and less often without predicates, though in my experience I've seen the index often used even without predicates.
Index-only scans are opportunistic, in that they take advantage of a pre-existing
state of affairs where it happens to be possible to elide heap access. However,
the server doesn't make any particular effort to facilitate index-only scans,
and it is difficult to recommend a course of action to make index-only scans
occur more frequently, except to define covering indexes in response to a
measured need
And:
Index-only scans are only used when the planner surmises that that will reduce the
total amount of I/O required, according to its imperfect cost-based modelling. This all
heavily depends on visibility of tuples, if an index would be used anyway (i.e. how
selective a predicate is, etc), and if there is actually an index available that could
be used by an index-only scan in principle.
So yeah, I don't know if your use case is exceptionally lucky or if the documentation is just pessimistic (or both). Good to know you can coerce the query planner into doing an index scan though.
Meanwhile I can only think of one use case for pagination with exact row counts – a user facing application. If you're preallocating storage you can get by with an estimate. If you're consuming an API and want something from near the end you can flip the sort order.