Postgres performance analysis resulted in a 10x improvement in CPU use

ledjon · on May 21, 2017

I'm a little disappointed the answer was "batch inserts"

This is write performance optimization 101. I bet you are getting wins in way more places in the pipeline than the evaluation pointed to by doing this.

Try doing the same optimization on a table with zero partial indexes and you will get the same 10x bump. It is better for many many reasons.

Still, super cool dig into performance tools and source code. It shows great aptitude and willingness to deep dive.

drob · on May 21, 2017

Batch inserts usually increase throughput by reducing the number of write IOPS, fsyncs, etc. They usually aren't associated with a 10x _CPU_ savings, which is the finding here.

There are a million 'db best practices' you can go implement blindly, but the point is that this methodology – determining the bottlenecking resource and then profiling to determine exactly what is consuming it – will _reliably_ yield huge wins, whereas implementing 'best practices' on gut alone is a very inefficient way to improve performance.

ledjon · on May 21, 2017

I'm not saying you need to do all "best practices" blindly but if insert speed is your problem (at a high level) batching is the very first thing to explore.

Without any specific numbers backing up the 10x we can only guess what improved 10x. All of those things you listed show up as CPU wait events as well. Without specifics I assume he means they inserted the same row count in 1/10th of the time. Not that there was a direct drop in CPU tasks.

malisper · on May 21, 2017

Author here. We were under the assumption that CPU was mostly being used for evaluating the partial index predicates. Under this assumption, we figured batching was unlikely to yield much of a benefit. It wasn't until we actually profiled Postgres did we realize batching would be worth a try.

As for the numbers, we specifically got a 10x improvement in ingestion throughput.

anarazel · on May 21, 2017

> Batch inserts usually increase throughput by reducing the number of write IOPS, fsyncs, etc. They usually aren't associated with a 10x _CPU_ savings, which is the finding here.

Often the saving on roundtrips is bigger than any of these. Most database client libraries work synchronously, so if you insert via single row INSERT statements you'll approximately get a two context switches, and a roundtrip for each row. That's often more costly wall clock time wise than the insertion itself.

quizotic · on May 22, 2017

While it's good that you know about this optimization, the key point is look at the evidence first. It's surprisingly hard to temporarily ignore what you know, and just use your eyes, but it's so important.

paulddraper · on May 21, 2017

The enormous advantage of batch inserts (in Java with SQLite) was one of the very first lessons in my undergrad database class.

theprop · on May 21, 2017

Just using Postgres instead of Oracle saved you many millions as well :-D.

tmaly · on May 21, 2017

that is the truth

endymi0n · on May 21, 2017

Really interesting, adds some good tools to the chest - especially the flame graphs nicely put things into perspective.

It's usually amazing how much more you can squeeze out of your database if you just take a deep look inside. Often times, you'd be surprised what it's actually doing...

Related / Further reading:

"Debugging PostgreSQL performance, the hard way"

https://news.ycombinator.com/item?id=12606480

louis-paul · on May 21, 2017

Use the Index Luke is an excellent resource about relational database performance tuning as well: http://use-the-index-luke.com. I bought Markus Winand's book, it's extremely clear and concise (not the thick technical reference you would expect). Being able to understand the internals of modern database systems is great.

yyparm · on May 21, 2017

One lesson to take from this is that it's often worth spending an extra half hour or hour validating that your assumptions about where time is being spent are true.

I've seen two examples recently where a potentially impactful optimisation was added to a product but it didn't actually work as intended because of minor errors. It took a couple of years before the performance bugs were found, which required several hours of work.

In the first, a O(n) algorithm was replaced with a O(log n) version, but part of it remained O(n) for a subtle reason. The code was still faster by a constant factor so it wasn't totally obvious. In that case validating that the algorithm was actually O(log n) by doing some experiments for larger values of n would have revealed runtime was increasing linearly.

In the second, an optional argument was added to a method that triggered an optimisation for a particular case, but it was never passed in at any callsites. In that case many different tests could have revealed that the optimisation wasn't effective or that the new code wasn't even running.

jacquesm · on June 1, 2017

> One lesson to take from this is that it's often worth spending an extra half hour or hour validating that your assumptions about where time is being spent are true.

That is the real takeaway from the article.

Making assumptions is fine, making assumptions and not immediately verifying whether they hold or not is not.

This is why it is super important to actually write down your assumptions and test them one-by-one when implementing some solution. More often than not you'll find that there is some light between what you thought was true and what is really happening.

So this is much more a systemic problem than just a database tuning problem.

nickpeterson · on May 22, 2017

This reminds me of toons koppelaars talking about 'thickdb' and moving business logic into the database on Oracle actually drastically reduced CPU usage.

stonelazy · on May 22, 2017

I wonder what the author means by batch inserts ? How would it have been achieved ? Is it a procedure or something ?

cpburns2009 · on May 22, 2017

The simplest way to perform a batch insert is by using an INSERT statement with many rows in the VALUES clause. E.g.,

    INSERT INTO table (columns...)
    VALUES
        (row 1...),
        (row 2...),
        ...

An alternative and potentially faster method is using the COPY FROM statement.

sigi45 · on May 21, 2017

Nicely done. Thats how good performance analysis looks like :)

Boothroid · on May 20, 2017

The flame graphs look really useful. I'll certainly be using this for a project I have on the go at the moment where some of the tasks are taking days in Postgres.

malisper · on May 21, 2017

If you have queries that are taking days, you'll probably be better of trying EXPLAIN ANALYZE[0] first. If you can't get enough information from EXPLAIN ANALYZE, then you may want to try using flame graphs.

[0] https://www.postgresql.org/docs/9.5/static/sql-explain.html

rodionos · on May 21, 2017

I'd remove reference to millions to make the article less clickbaity. Otherwise, good use of flamegraphs.

bbcbasic · on May 21, 2017

That's a 10x-er

goldfeld · on May 21, 2017

That's precious; I wonder when we'll be hearing about startups saving millions.