This is a common operation for my team as well. In my old project, so many indexes that were added were just assumptions about the usage pattern of the data. Not only did they blow up the size of the table and INSERT time, they became the opposite of a red-herring where developers would stop and look elsewhere because the table "already has indexes".
Only adding indexes with good use of EXPLAIN/EXPLAIN ANALYZE on expected usage patterns yielded good results. Often we'd just blow the indexes away and fiddle until we added the correct one to enable an index scan / index only scan.
Other big boons for us have been using ENUM types where necessary (small known dataset for a column) -- now your column takes 4 bytes instead of N bytes for a string. I find them a bit easier to work in than foreign keys for this optimization because of their direct string mapping.
We've had really good mileage with Datadog's APM tools, RDS performance insights and pghero as you mentioned.
The ENUM type sounds like a great little tweak; I've got quite a few small tables that describe a fixed number of rows so I'll definitely investigate that.
I'm my own worst enemy for indexes, because I'm not only the database administrator but also the analyst. So when I'm trying to solve an ad-hoc problem and encounter a long-running query there's a 99% probability that I may have "accidentally" generated a new index to accelerate that query and then forgotten to remove it when I was done.
Lesson learned, and also a good argument for disaggregating the administrative and analytical user permissions.
More recently I’ve got into the habit of creating a new schema when I’m analysing / patching data so I can pull out the data I need into new tables and index accordingly. Because I do it from vim and have a plugin to execute the queries, I just create a record as I go (and I set the path to hit my new schema first so any new tables go there).
> Other big boons for us have been using ENUM types where necessary (small known dataset for a column) -- now your column takes 4 bytes instead of N bytes for a string. I find them a bit easier to work in than foreign keys for this optimization because of their direct string mapping.
Can you give an example?
It sounds like normalising the schema would have similar (perhaps more) benefits.
Normalizing has similar size benefits. For our use case (big aggregate reporting tables), incoming external data is not normalized, meaning we'd have to have another process iterate over it with foreign table in memory and map it to the foreign key. Enum's can be transparently ingested without this normalization requirement, while taking equivalent space. There's also now one level of indirection between the column and its value. Both are fine.
Postgres keeps ENUM types in memory at all times. A join table is probably a slower option, all other things being equal. There are cases where it's worthwhile (dynamically expanding the values that would be in that enum, for example), but it's a tradeoff.
Only adding indexes with good use of EXPLAIN/EXPLAIN ANALYZE on expected usage patterns yielded good results. Often we'd just blow the indexes away and fiddle until we added the correct one to enable an index scan / index only scan.
Other big boons for us have been using ENUM types where necessary (small known dataset for a column) -- now your column takes 4 bytes instead of N bytes for a string. I find them a bit easier to work in than foreign keys for this optimization because of their direct string mapping.
We've had really good mileage with Datadog's APM tools, RDS performance insights and pghero as you mentioned.