I'm a little disappointed the answer was "batch inserts"
This is write performance optimization 101. I bet you are getting wins in way more places in the pipeline than the evaluation pointed to by doing this.
Try doing the same optimization on a table with zero partial indexes and you will get the same 10x bump. It is better for many many reasons.
Still, super cool dig into performance tools and source code. It shows great aptitude and willingness to deep dive.
Batch inserts usually increase throughput by reducing the number of write IOPS, fsyncs, etc. They usually aren't associated with a 10x _CPU_ savings, which is the finding here.
There are a million 'db best practices' you can go implement blindly, but the point is that this methodology – determining the bottlenecking resource and then profiling to determine exactly what is consuming it – will _reliably_ yield huge wins, whereas implementing 'best practices' on gut alone is a very inefficient way to improve performance.
I'm not saying you need to do all "best practices" blindly but if insert speed is your problem (at a high level) batching is the very first thing to explore.
Without any specific numbers backing up the 10x we can only guess what improved 10x. All of those things you listed show up as CPU wait events as well. Without specifics I assume he means they inserted the same row count in 1/10th of the time. Not that there was a direct drop in CPU tasks.
Author here. We were under the assumption that CPU was mostly being used for evaluating the partial index predicates. Under this assumption, we figured batching was unlikely to yield much of a benefit. It wasn't until we actually profiled Postgres did we realize batching would be worth a try.
As for the numbers, we specifically got a 10x improvement in ingestion throughput.
> Batch inserts usually increase throughput by reducing the number of write IOPS, fsyncs, etc. They usually aren't associated with a 10x _CPU_ savings, which is the finding here.
Often the saving on roundtrips is bigger than any of these. Most database client libraries work synchronously, so if you insert via single row INSERT statements you'll approximately get a two context switches, and a roundtrip for each row. That's often more costly wall clock time wise than the insertion itself.
While it's good that you know about this optimization, the key point is look at the evidence first. It's surprisingly hard to temporarily ignore what you know, and just use your eyes, but it's so important.
Really interesting, adds some good tools to the chest - especially the flame graphs nicely put things into perspective.
It's usually amazing how much more you can squeeze out of your database if you just take a deep look inside. Often times, you'd be surprised what it's actually doing...
Use the Index Luke is an excellent resource about relational database performance tuning as well: http://use-the-index-luke.com. I bought Markus Winand's book, it's extremely clear and concise (not the thick technical reference you would expect). Being able to understand the internals of modern database systems is great.
One lesson to take from this is that it's often worth spending an extra half hour or hour validating that your assumptions about where time is being spent are true.
I've seen two examples recently where a potentially impactful optimisation was added to a product but it didn't actually work as intended because of minor errors. It took a couple of years before the performance bugs were found, which required several hours of work.
In the first, a O(n) algorithm was replaced with a O(log n) version, but part of it remained O(n) for a subtle reason. The code was still faster by a constant factor so it wasn't totally obvious. In that case validating that the algorithm was actually O(log n) by doing some experiments for larger values of n would have revealed runtime was increasing linearly.
In the second, an optional argument was added to a method that triggered an optimisation for a particular case, but it was never passed in at any callsites. In that case many different tests could have revealed that the optimisation wasn't effective or that the new code wasn't even running.
> One lesson to take from this is that it's often worth spending an extra half hour or hour validating that your assumptions about where time is being spent are true.
That is the real takeaway from the article.
Making assumptions is fine, making assumptions and not immediately verifying whether they hold or not is not.
This is why it is super important to actually write down your assumptions and test them one-by-one when implementing some solution. More often than not you'll find that there is some light between what you thought was true and what is really happening.
So this is much more a systemic problem than just a database tuning problem.
This reminds me of toons koppelaars talking about 'thickdb' and moving business logic into the database on Oracle actually drastically reduced CPU usage.
The flame graphs look really useful. I'll certainly be using this for a project I have on the go at the moment where some of the tasks are taking days in Postgres.
If you have queries that are taking days, you'll probably be better of trying EXPLAIN ANALYZE[0] first. If you can't get enough information from EXPLAIN ANALYZE, then you may want to try using flame graphs.
This is write performance optimization 101. I bet you are getting wins in way more places in the pipeline than the evaluation pointed to by doing this.
Try doing the same optimization on a table with zero partial indexes and you will get the same 10x bump. It is better for many many reasons.
Still, super cool dig into performance tools and source code. It shows great aptitude and willingness to deep dive.