Hacker News new | past | comments | ask | show | jobs | submit login
Postgres Indexes for Newbies (crunchydata.com)
304 points by todsacerdoti on Jan 19, 2022 | hide | past | favorite | 40 comments



Use the index, Luke is always a great resource.

[0] https://use-the-index-luke.com/


Another amazing resource are the CMU Database Systems videos with Prof. Andy Pavlo. Videos from 2021 with a different professor (Andrew Crotty) have also been uploaded, but Pavlo is just so engaging.

https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_... (Pavlo, 2019)

https://www.youtube.com/playlist?list=PLSE8ODhjZXjZaHA6QcxDf... (Crotty, 2021)


But there is still no better resource for technical details in Postgres than this series https://habr.com/en/company/postgrespro/blog/441962/


These are great, thank you.


He really is engaging. I love the guest talks too.


This is so great. It also teach me how functionality differ between RDBMSs.


I love that site.


I’m loving all the Postgres submissions on HN over the last couple days. My professional experience is mostly with Redshift which is only Postgres-like.

I was not aware of BRIN indexes. Am I correct in assuming those are suitable for analytical workloads? Is the data implicitly sorted in a BRIN index or is there a mechanism to optimize the storage layer for BRIN indexed data?


Not an expert, but have played with BRIN indices on a much older version of Postgres. Some of my experience may be misremembered or out of date.

You are correct that they can be used for analytical workloads (but not exclusively, you still have other index types available and transactions, etc). They have very little performance overhead for the write path and small memory footprint compared to B-tree indexes.

However, they are only effective for improving read performance when filtering a table with a selective predicate, and even then- only if there’s some temporal relationship between the indexed column values and the time that the rows are inserted. Auto incrementing id and created_at columns work great at this.

So to your second point, no- I don’t believe there’s any specific storage layer awareness of them. However, Postgres inserts are typically at the “tail” of the table as a consequence of the MVCC implementation (although, tables with frequent deletes and updates can be confounding to the effectiveness of BRINs).

Remember though that the correlation needn’t be perfect to be very effective. If you can limit what would otherwise be a full table scan + filter evaluation to a scan + filter of a handful of pages that “might” contain matching rows, that can be a huge performance boost with a tiny cost of maintaining the index.


Interesting, thanks for sharing.

I am comparing this mentally to Redshift's sort keys. It sounds like a BRIN index might approximate that behavior for the time dimension but not for a locale dimension (integer value, low cardinality, high volume).

Do BRIN indices support compound keys? Sounds like that may not even be desirable even if they do.

With a Postgres analytic table that has a time and locale dimension it sounds like a hybrid of a BRIN index on the time dimension and then partition on the locale, which probably manifests as an inherited table.

But that would require some contortions in the ETL layer to re-write the partitions in order when they are updated, and to create and drop the new partitions as necessary.


I don’t see any reason why it couldn’t support compound keys, assuming the combined min/max definition is in index column defined order. I have no idea if the Postgres implementation does though.

You’re also correct that a compound BRIN is questionably useful. If the first column in the key isn’t already an effective filter, you’re going to have to scan the whole table anyway in most cases- and if it is, then the cost of storing the min/max of subsequent columns is going to increase the overhead of the BRIN index in a way that seems unlikely to justify the benefit. It seems to me that the indexed columns would have to be mutually correlating in sort order for that to be useful (eg: created_date, created_time)


> Do BRIN indices support compound keys?

Yes.

https://www.postgresql.org/docs/current/indexes-multicolumn....

“Currently, only the B-tree, GiST, GIN, and BRIN index types support multiple-key-column indexes.”


BRIN indexes don't affect the storage layer, so you need to make sure your table is appropriately sorted on disk through other means. CLUSTER can do this as a one-off job (pg_repack if you need to keep the table available for writes), but won't sort new data as it's written.

https://www.postgresql.org/docs/current/sql-cluster.html

https://reorg.github.io/pg_repack/


For all of its greatness and (mostly well deserved) praises, the lack of a reasonable cluster index capability (as in data order at storage layer) is Postgres' biggest limit IMHO.

Unfortunately, The CLUSTER command not only "blocks" the table for WRITE ops, but more importantly, it also blocks READ operations [0]. pg_repack helps, but is not always available when using a managed PG offering.

Not being able to control data ordering on disk is a potential deal breaker once the data reaches a certain size.

[0]: https://www.postgresql.org/docs/current/sql-cluster.html


Starting with PostgreSQL 11, B-tree indexes can store additional "non-key" columns in the index. These are kept in the index alongside the sorted key columns. This can avoid expensive sorts in certain query plans.

It's not perfect. Stale table statistics sometimes forces PostgreSQL to check table pages to confirm that rows are visible to the current transaction. Index-only scans don't support features like expressions either.


Is there a way to get this behavior with the implicit index on the primary key?


Any idea what that size is? When should one be worried?


It depends on a lot of factors, my unscientific rule of thumb is that if the data doesn't fit in the RAM of the server, then that's when you may start worrying about how the data will be fetched from the disk when you query it (if performance matters).

The problem case is when the data needed to serve an average query is "scattered across the disk". This means that you will need to potentially fetch way more information from the disk (because data is returned by blocks/pages, not by bytes) than necessary to fulfill the query.

A worst case example: say to compute a query, you need to get 5000 rows (with a size of 100 bytes per row, 500KB in net total), if they are "perfectly scattered" (i.e. each row on a different page), you will really need to bring in 5000 pages of 8KB per page (default page size in Postgres) for a total of 40MB. By fetching 80 times more data than needed, you've essentially reduced your throughput by 80x.

Note that the example above assumes that an index (e.g. btree) can be leveraged. The index would point directly to the numerous pages, which is likely still much better compared to doing a full scan. But that index doesn't solve all your problems, only part 1.

It may not be a big deal as bringing 40MB from a disk would go fast, but this will limit 1) the number of concurrent user you'll be able to serve, 2) if you need 100,000 rows instead of 5000, then your query will take longer to process and it may be negatively noticeable by your user.

If you can co-locate the data on the disk (i.e. put them on same or contiguous pages) deterministically, then you would feel much better about your throughput. Cluster Index (or Indexed Views as it's called in SQL Server) is the typical mechanism to sort the data on the disk in RDBMS. MySQL does that by default with the primary key, but not Postgres.


> Cluster Index (or Indexed Views as it's called in SQL Server) is the typical mechanism to sort the data on the disk in RDBMS. MySQL does that by default with the primary key, but not Postgres.

Does that mean that Mysql moves the second or the first half of the table if you insert a row in the middle? I can't imagine that.

I've recently considered clustering multiple tera bytes of time-ordered data stored >100 partitions in a Postgres 12 instance. After careful consideration I came to the conclusion it wasn't worth it. Clustering would have sorted all the rows by date in the table blocks. But that doesn't guarantee anything about the block layer below the filesystem. So it is of dubious value from the standpoint of performance. The other advantage I was hoping for was being able to use a BRIN index. But since, my database has very rare cases of updates of those rows. A BRIN index looses its value very fast. Either I lower the fillrate to leave space in every block for updates. Which allows the BRIN index to stay current but costs a lot of space. Or I would have to force BRIN index updates regularly because they can be lossy. And that is not acceptable in my application. The whole database is stored on nvme disks managed by zfs. That won't benefit from ordering the data on some arbitrary abstraction in the middle.


> is the typical mechanism to sort the data on the disk in RDBMS

SQL Server and Oracle both default to heaps (though it's rare to see a SQL Server table without a clustered index).


> SQL Server and Oracle both default to heaps

I don't think that's true for SQL Server.

If you define a primary key in SQL Server, this is automatically a clustered index.


Any table in SQL Server without a clustered index will be stored as a heap.

https://docs.microsoft.com/en-us/sql/relational-databases/in...


Yes, but the _default_ (and that's what I was referring to) is a clustered index.


Then the definition depends on whether you consider adding a primary key as default or not. I would not consider it default, though it is customary.


Very cool, thanks for sharing. I'm curious how far vanilla Postgres can be taken for analytics before more exotic columnar solutions like Redshift start to make sense.


A BRIN index is basically a minimum and maximum value in the indexed column for each megabyte of table.


Had a question and curious if you know. Could an updated_at column work theoretically better than a created_at column if there are some deletes and updates? Would updated_at be ordered as they are on disk?


I always first try to teach the intuition that an index is a different sorting of the data (with the full table scan being insertion order/sort by rowid). Everything that's faster with data sorted in this manner is faster in the database. That's for 2nd semester students who should have heard of sorted-list merging and binary search.

That overlooks hash indexes or index-only queries, of course.

Love the article but I'm not sure that the indirection of the index being an additional data structure really helps understanding it better.

I think you basically only need to understand that once we have some degenerated indexes with bad selectivity where a full table scan would be better. Here the additional index access is neck breaking.

Of course, this sorting-intuition doesn't help in understanding the different kind of indexes. Is that a topic for newbies though?


> with the full table scan being insertion order/sort by rowid

I'm not sure that makes sense. As far as I'm aware selecting without sorting has no guaranteed order across vendors at all.


No guaranteed order, but very typically a predictable order depending on the query plan chosen. If you’re relying on an ordering, you need to specify it in the query.

However, the scan order is predictable for a given engine depending on the query plan. If it’s doing an index scan, you’ll get results in index order and if it’s doing a table scan you’ll get it in table storage order (assuming a transactional, single node [edit: also single threaded] OLTP database- this goes out the window when outputs are aggregated across multi node (edit: or multi threaded) scans without an ORDER BY)


Postgres can do parallel table and index scans. Don't rely on order without explicit order by.


True, I forgot to consider how single node (multi threaded) parallelism can look a lot like multi node parallelism


For existing table, HypoPG is a very nice extension that can help you check if your index is helpful before actually creating it.

https://github.com/HypoPG/hypopg


One thing I struggle with when figuring out the best indexe is creating enough junk in the database to force it to use an index vs seq scan. Anyone have tips on good data generation tools that works with foreign keys and creates a good range of data? I've looked into using queries and generate_series but it doesn't work well with UUID keys (I think?)

Right now I have a bash script that creates a bunch of entries through my app API...


A sequential scan can be faster than an indexed search in many cases. There is a good reason optimizers often choose to do sequential scans even when an index is available. Remember, the index is displacing data in the cache. Even for data that is purely in memory, sequential scans are often friendly for CPU performance.

On modern server hardware you usually need megabytes of data in a table before sequential scans start to become suboptimal. Most database engines are designed under this assumption.


If there's an index and the query planner is choosing sequential scan, it's doing so because it thinks that the random access pattern for the indexed data is going to be slower than the sequential access for the whole dataset. It's almost definitely right.

So why generate the data? It will sort itself out when there's enough real data in there.

You can also: - disable "enable_seqscan" - increase "seq_page_cost" - decrease "random_page_cost"


IME almost every query that takes way longer than expected in PG has been because it elected to do a sequential scan instead of using an index. I've set both random_page_cost and seq_page_cost to 1 for a database on an SSD (where sequential is still faster than random) but PG still chooses sequential scans. I selectively disable sequential scans on the connection ahead of badly planned queries now.


If your data changes a lot, try REINDEX and VACUUM ANALYZE on the tables involved and then test and check if the query planner still wants to do seq scan.

We have a couple of rather large tables with a lot of changes, where is’s needed regularly to ensure the query planner makes the best possible plans.


Richard Foote's Blogs have a lot of details about different types of Indexes that are available in Oracle : https://richardfoote.wordpress.com/category/index-internals/


Creating indexes without understanding how they actually arrange the underlying data is at best a mild performance hit and at worst a catastrophic performance sink.

Each type handles a different access pattern and query work load. But “I have an index” doesn’t mean anything if the column order or operator doesn’t match the actual task.

If you want to really make your database experience shine, then take the time to understand how each type of index actually lays out your data. What it means to perform a range operation (e.g. key > X) v.s. an equality (e.g. key = or contains X). Otherwise you’re adding overhead to every data modification with no long term gain.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: