PostgreSQL Benchmark on FreeBSD, CentOS, Ubuntu Debian and OpenSUSE

srevenant · on Dec 28, 2017

Sorry, but your testing methodology needs some help. Were you testing OSes? Postgres? Storage configuration? I suggest investigating FIO first, use it to isolate the best performing disk configuration (Storage+Kernel+Filesystem+whatever), then do some pgbench tests with different postgres tuned parameters, to show the best way to tune postgres.

A few thoughts:

* You weren't testing OSes which the subject implied, you were testing Linux kernel variants and their stock OS configurations/kernel scheduler setups, and FreeBSD was tossed into the mix. Whether you are running Ubuntu, CentOS, Debian or whatever you should have the Linux kernel tuned to perform well, so adding the distribution as a variable is just a red herring. I'd be more interested in removing that variable and comparing different storage configurations (such as XFS, and LVM).

* Clients connecting over the network adds a huge variable at play (the network) -- ideally you would want to remove this.

* I may have missed it, but it wasn't clear if you had a warmup period to your benchmarks. Especially with a system like ZFS which has COW, you need to do a few benchmarks on the same blocks first, to break past the cache.

kev009 · on Dec 28, 2017

He did adequately disclose that the OS was swapped out while pgsql and settings were held steady. Adding in more arbitrary benchmarks like FIO wont really clarify pgsql performance if that is the desired investigation. Instead, active bench marking where you identify the bottlenecks in each tested platform would be a better use of time, and you could hypothesize how to improve each platform if you were to pick it.

As a counter-example, I could easily cherry pick versions, tunables and patch sets to make the numbers go whichever way I want so these types of comparisons aren't that useful unless someone is dropping a big delta on the floor with out of the box settings vs another.

As a FreeBSD developer, I will actually tell you that Linux could be selected to graph massive wins by cherry picking hardware with very high core count and several NUMA domains. But even then, by selecting kernel features (which could be innocuously hidden in a version number/vendor patch set) you can cherry pick large swings. That said, FreeBSD+ZFS+PGSQL (https://www.slideshare.net/SeanChittenden/postgresql-zfs-bes...) is a joy to administer, and is unlikely to be the weak link in a production setup if you stick to a two socket system. There is a lot of work going on in HEAD that is relevant to this workload in the memory management subsystem, including NUMA support. And some TCP accept locking changes that'd be relevant for TCP connection turnover.

chasil · on Dec 28, 2017

People benchmark CentOS too much. A far more interesting test would be Oracle Linux, both with and without the UEK.

I would bet this would wipe out the SUSE advantage:

# cat /proc/version

Linux version 4.1.12-112.14.1.el7uek.x86_64 (mockbuild@) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #2 SMP Fri Dec 8 18:37:23 PST 2017

The "RedHat-compatible kernel" should have identical performance to CentOS.

# rpm -qa | grep ^kernel | sort

kernel-3.10.0-693.11.1.el7.x86_64

...

kernel-uek-4.1.12-112.14.1.el7uek.x86_64

...

I realize that many people don't like Oracle Linux due to their unauthorized appropriation and support of their RedHat clone. It does bring new functionality to the table, however (primarily Ksplice) and has great support for the eponymous database.

When RedHat 9 support ended, it never touched my personal systems again. Even my CentOS habit has quelled.

kovacik · on Dec 28, 2017

Hello, OP here. I'm certain that you can fine tune every OS for specific use case. I may indeed do that in a future blogpost. The question is what to compare ? Should I compare Linux kernel versions, PostgreSQL versions, filesystems (and features like compression, block size, ...) ? As you can see the permutations are endless and thats why I compared stock OSes with their default filesystems of choice.

I don't think that a Linux distribution is just a variable and the only thing that differs is the kernel version. Each distro made its own choices, for better or worse...

As for the clients connecting over the network - that was exactly my point. My idea was to benchmark in conditions similar to production deployment. I doubt that many production systems connect over unix socket.

And for the warmup period, as you can see in the benchmarking script there is a 30 min warmup period before I start to record the results.

bradknowles · on Dec 29, 2017

With respect, I believe it should be the TPC-B benchmark, not TCP-B.

It is from the “Transaction Processing Performance Council”, correct? At least, that’s what they call themselves at tpc.org.

Otherwise, interesting results that I think need further examination.

dspillett · on Dec 28, 2017

> testing Linux kernel variants and their stock OS configurations/kernel scheduler setups, and FreeBSD was tossed into the mix

For a lot of people who don't have time/understanding to play with things much beyond using stock versions and configuration, this could still produce a useful benchmark. People who have the knowledge, confidence, and time, probably won't be reading the article at all as they'll have already performed their own less artificial tests (i.e. benchmarking with their own application using live-like data and load patterns).

Though that is my argument against any benchmark like this: it is at best an indicator of peak activity under very specific conditions, a starting point but it doesn't really represent my application with any precision nor accuracy.

> it wasn't clear if you had a warmup period to your benchmarks

I would agree that is a significant point.

kovacik · on Dec 28, 2017

There is a 30 min warmup period before the actual benchmark. It is shown in the benchmarking script.

Fnoord · on Dec 28, 2017

Yeah I was thinking about cache as well.

It also seems unfair to compare Ext4fs on Linux with ZFS on FreeBSD. Ubuntu ships with ZFS included. There's also Btrfs.

Also, its known that Netflix uses FreeBSD internally, and I'm curious why.

kovacik · on Dec 28, 2017

I think it's pretty fair to test each OS with its filesystem of choice. I'm aware that you can use ZFS on Linux, but I'm not (yet) brave enough to recommend ZFS+Linux. And yes there's btrfs but would you trust it with your data ? :)

chasil · on Dec 28, 2017

ZFS is bundled into the Antergos Linux installer.

I think pacman can get you their PostgreSQL package easily enough.

I have screenshots at the end of this (unpublished) article:

http://syro.org/systemd/zfs.html

mobilemidget · on Dec 29, 2017

I kind of miss network traffic diagrams; just mentioning its connecting over Gbit isn't enough for me. Is there 200 mbit sql traffic going back/forth, or just 8 mbit?

RantyDave · on Dec 28, 2017

The glaring difference will probably be a ZFS thing. On some systems the default page size is 128k whereas databases usually use 8k. I didn't read closely enough to see what the story is with async writes, or the ZIL, or whatever. tl;dr - there are huge gains to be had with ZFS with a little tuning.

ianai · on Dec 28, 2017

I think the BSDs have always left it to the administrator to optimize the system to their needs, and it didn’t look like OP did any optimization for the first set of results.

kovacik · on Dec 28, 2017

I did set the recommended 8k block size for PostgreSQL dataset as described in the blogpost. Also note the logbias=throughput option.

stock_toaster · on Dec 29, 2017

You might also try setting primarycache=metadata, to avoid double caching the data. From what I have been told, this can help reduce memory pressure of the ARC competing with postgres' own caching.

Are you also setting a max limit for the ARC? You don't want postgres and the zfs ARC to compete for memory. I wonder if this impacts FreeBSD's poor performance in the read intensive tests.

kovacik · on Dec 29, 2017

I actually did try this and the results were worse. I did not investigate it further and went with primarycache=all

ianai · on Dec 28, 2017

I stand corrected. Thank you.

RantyDave · on Dec 29, 2017

Ah. OK, sorry.

gigatexal · on Dec 28, 2017

I don’t think it’s linux raid software overhead but on the fly compression decompression of ZFS which could mean the BSDs are doing less IO than their Linux counterparts. I don’t know for sure it’s just a hunch.

olavgg · on Dec 28, 2017

I've done plenty of benchmarks with LZ4 and GZIP compression. It really helps a lot, especially LZ4 should always be enabled as it is always faster. GZIP is fantastic for some datasets where you do sequential scans of many tables around 10MB - 100MB'ish sizes, but should be used with care.

The great thing about PostgreSQL is using tablespaces, so you can put tables and indexes on different ZFS filesystems, with different hardware(disk vs ssd), recordsizes and compression options.

gigatexal · on Dec 29, 2017

Interesting. I didn’t know about table spaces.

kovacik · on Dec 28, 2017

Before the benchmark I did some test with FreeBSD+ZFS with LZ4 on/off. The LZ4 CPU overhead was pretty negligible and performance was slightly better than without compression, due to lower IO usage. That's the reason I've chosen LZ4 for the actual benchmark.

b2ttb · on Dec 28, 2017

The benchmark is a bit arbitrary, there are just too many variables. Filesystems (ZFS/ext4), RAID (ZFS Raid vs Linux Software Raid), compression (ZFS lz4 vs raw), IO-Scheduler (CFS, deadline, ULE). Also it is not clear if the Postgres dataset includes the data and transaction logs (WAL).

brightball · on Dec 28, 2017

Wow! The difference in RAID between FreeBSD and the GNU Linux flavors is more than a little bit shocking on the read/write tests. I would not have expected it to be that significant.

I wonder what it will take to fix that in Linux?

Fiahil · on Dec 28, 2017

> 32GiB read only: read only test, the dataset does not fit into the PostgreSQL cache

> 200GiB read only: read only test, the dataset fits into the PostgreSQL cache

How come a larger dataset fit into the PostgreSQL cache and not the smaller one ?

belthesar · on Dec 28, 2017

I wondered the same thing. Best explanation had to do with the 32 GB dataset being on the system with 32 GB of RAM vs the 200 GB dataset on the system with 256 GB of RAM. Makes sense, but isn’t clear from the article’s perspective.

elygre · on Dec 28, 2017

Probably different postgresql.conf-settings. For example, 32GiB instance has "effective_cache_size = 24GB", while 200GiB instance has "effective_cache_size = 144GB"

kovacik · on Dec 28, 2017

The size of postgresql database was ~74GB for every test combination. The bare metal server had 256GB RAM for every test combination.

The only difference was the amount of RAM PostgreSQL could use - as specified in postgresql.conf.

So, 74GB database does not fit in PostgreSQL cache for 32GB instance and fits for 200GB instance.

lazyjones · on Dec 28, 2017

They used 2 different servers. One with 32GB RAM, the other with 256GB.

ianai · on Dec 28, 2017

That might explain the first results. ZFS eats ram (to good use).

ketsa · on Dec 28, 2017

Looks like the underlying filesystem is the key, not the OS.

anttisalmela · on Dec 28, 2017

At least Debian and Ubuntu has cfq as the default IO scheduler which probably is not the best choice here.

koolba · on Dec 28, 2017

Been a while since I played with those settings. Is the usual recommendation still to stick with the deadline scheduler for DB workloads?

jabl · on Dec 28, 2017

Probably more like if you have rotational storage & desktop workloads, then cfq, else deadline.

Tostino · on Dec 28, 2017

Oh man, that is some serious SW raid overhead.

Does anyone who knows FreeBSD and Postgres have any idea why it was so much slower for the read only tests?

GalacticDomin8r · on Dec 28, 2017

It may be related to the multi-threaded page daemon:

https://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf

jedisct1 · on Dec 28, 2017

DragonflyBSD should have been included.

alsadi · on Dec 28, 2017

The fs of choice for a database workload in centos 7 should be xfs because it's one of the defaults (specially in public clouds) and because it can be grown online. I suggest you add it.

justinsaccount · on Dec 29, 2017

> it's one of the defaults

so is ext4..

> and because it can be grown online

How this is different from how resize2fs can resize a mounted ext4 filesystem?

xfs may still be better than ext4 for some workloads, but "it's there" is not a reason why someone should use it.

darksaints · on Dec 28, 2017

I wasn’t expecting such a drastic difference between FreeBSD and the linux distros. I wonder if this is a general BSD thing, or if it was FreeBSD specific (or filesystem or drivers).

djsumdog · on Dec 28, 2017

Well if you look at the final test at the bottom, it seems that the file system/software RAID play a pretty big factor.

aiCeivi9 · on Dec 28, 2017

I am guessing it is the filesystem. The call to fsync() ( or equivalent) is often the hot spot for DB performance.

acd · on Dec 28, 2017

Note that FreeBSD in the benchmark is running with LZ4 ZFS compression on which works like a speed miracle for read only database tests. ZFS with compression doubles the read bandwidth of the disks will close to double the database benchmark speed.

In Linux one would need to benchmark BTRFS with LZO compression to benchmark compete against FreeBSD ZFS with compression.

justinsaccount · on Dec 29, 2017

In the linked article, FreeBSD+zfs was half as fast as linux on the read only tests.

xi- · on Dec 28, 2017

You benchmarked a compressed filesystem against uncompressed filesystems and are wondering why the compressed filesystem is slower?

acdha · on Dec 28, 2017

That's specifically incorrect: the results showed that the compressed ZFS was slower on pgbench and faster on TPC-B.

It's also generically an unwise assumption because you have to know how fast the compression is compared to disk I/O. If, like almost all servers these days, you have more CPU than I/O capacity the compression overhead will often be buried in the I/O latency and if the data compresses well it it's easy to have it be faster because the I/O savings is greater than the CPU.

Since LZ4 was designed to be very fast, that seems like a reasonable bar to hit — I see single-core performance on old desktops in the 1.8+GB/s range.

rsa4096 · on Dec 28, 2017

I find it hard to believe that many would use SW RAID with Postgres in production, especially under load conditions that were tested for here. I have seen little hope for performant SW RAID on Linux in the past twenty years. It would be interesting to see the same tests run using stable HW RAID of some sort (preferably, with SSDs)

chasil · on Dec 28, 2017

ZFS raidz is software raid.

If you want the ability to scrub (in such a way that detects errors over the whole storage stack), you must sacrifice hardware raid. You must also pay the fletcher/sha256 tax.

ZFS enthusiasts advocate that critical data absolutely requires those sacrifices. I'm inclined to agree.

rsa4096 · on Jan 4, 2018

I should have been more specific. I find it hard to believe many would use Linux software raid (i.e. md) for Postgres in production.

kchoudhu · on Dec 28, 2017

Wow, this is pretty eye-opening for anyone looking to use FreeBSD in production as a database server.

Anyone from FreeBSD devs have any thoughts on why we're so far behind?

GalacticDomin8r · on Dec 28, 2017

I think you meant so far ahead.

howard941 · on Dec 28, 2017

The article's confusing and surprising. He tried postgresql on FreeBSD UFS where it sucked but on ZFS, it rocked.

GalacticDomin8r · on Dec 28, 2017

Yeah it's not a super great article, one thing I prefer to see is ufs compared against ext and zfs under both OS's. Plus detailed info on the setup.

kakarot · on Dec 28, 2017

I can't see the results, even with uBlock and uMatrix turned off. Using Firefox.

Macha · on Dec 28, 2017

uBlock and HTTPS Everywhere both enabled but no uMatrix - works for me on Firefox.

songco · on Dec 29, 2017

Great test, thanks.

more-entropy · on Dec 28, 2017

I suppose, it's worth nothing to read an article, which compares benchmarks of Ubuntu and CentOS. It's too bullshit. Be honest, compare comparable things.