More

pwmtr · 2025-10-17T10:53:58 1760698438

We’ve been seeing the same trend. Lots of teams moving to Hetzner for the price/performance, but then realizing they have to rebuild all the Postgres ops pieces (backups, failover, monitoring, etc.).

We ended up building a managed Postgres that runs directly on Hetzner. Same setup, but with HA, backups, and PITR handled for you. It’s open-source, runs close to the metal, and avoids the egress/I/O gotchas you get on AWS.

If anyone’s curious, I added here are some notes about our take [1], [2]. Always happy to talk about it if you have any questions.

[1] https://www.ubicloud.com/blog/difference-between-running-pos... [2] https://www.ubicloud.com/use-cases/postgresql

normie3000 · 2025-10-17T11:26:35 1760700395

This is one key draw to Big Cloud and especially PaaS and managed SQL for me (and dev teams I advise).

Not having an ops background I am nervous about:

* database backup+restore * applying security patches on time (at OS and runtime levels) * other security issues like making sure access to prod machines is restricted correctly, access is logged, ports are locked down, abnormal access patterns are detected * DoS and similar protections are not my responsibility

It feels like picking a popular cloud provider gives a lot of cover for these things - sometimes technically, and otherwise at least politically...

ozim · 2025-10-17T12:10:21 1760703021

Applying security patches on time is not much problem. Ones that you need to apply ASAP are rare and for DB engine you never put it on public access, most of the time exploit is not disclosed publicly and PoC code is not available for patched RCE right on day of patch release.

Most of the time you are good if you follow version updates for major releases as they come you do regression testing and put it on prod in your planned time.

Most problems come from not updating at all and having 2 or 3 year old versions because that’s what automated scanners will be looking for and after that much time someone much more likely wrote exploit code and shared it.

DanielHB · 2025-10-17T12:20:03 1760703603

There must be SaaS services offering managed databases on different providers, like you buy the servers they put the software and host backups for you. Anyone got any tips?

swiftcoder · 2025-10-17T12:25:16 1760703916

to be fair, AWS' database restore support is generally only a small part of the picture - the only option available is to spin an entirely new DB cluster up from the backup, so if your data recovery strategy isn't "roll back all data to before the incident", you have to build out all your own functionality for merging the backup and live data...

matt-p · 2025-10-17T12:32:35 1760704355

I think the "strategy" for most people is to do it manually, or make the decision to just revert wholesale to a particular time.

swiftcoder · 2025-10-17T12:51:49 1760705509

Yeah, and that default strategy tends to become very, very painful the first time you encounter non-trivial database corruption.

For example, one of my employers routinely tested DB restore by wiping an entire table in stage, and then having the on call restore from backup. This is trivial because you know it happened recently, you have low traffic in this instance, and you can cleanly copy over the missing table.

But the last actual production DB incident they had was a subtle data corruption bug that went unnoticed for several weeks - at which point restoring meant a painful merge of 10s of thousands of records, involving several related tables.

matt-p · 2025-10-17T12:54:47 1760705687

Yeah, but automating a solution for all possible "one off subtle data corruption bugs" is a lot of energy and effort to be honest.

swiftcoder · 2025-10-17T13:47:43 1760708863

For sure. It's more about having a pipeline for pulling data from multiple sources - rather than spin up a whole new DB cluster, you usually want to pull the data into new tables in your existing DB, so that you can run queries across old & new data simultaneously

recroad · 2025-10-17T16:01:20 1760716880

Exactly this. For a small team that's focused on feature development and customer retention, I tend to gladly outsource this stuff and sleep easy at night. It's not even a cost or performance issue for me. It's about if I start focusing on this stuff, what about my actual business am I neglecting. It's a tradeoff.

ksajadi · 2025-10-17T11:41:35 1760701295

I can attest to that. At Cloud 66 a lot of customers tell us that while the PaaS experience on Hetzner is great, they benefit from our managed DBs the most.

gizzlon · 2025-10-17T15:17:59 1760714279

What's the "the PaaS experience on Hetzner" ? Link?

baobun · 2025-10-17T11:56:04 1760702164

In the adjacent category of self-managed omakase postgres: https://www.elephant-shed.io/

bdcravens · 2025-10-17T12:07:02 1760702822

While I'm sure it's a great project, a few issues in the README gave me pause to think about how well it's kept up to date. Around half of the links in the list of dependencies are either out of date or just plain don't work, and referencing Vagrant with no mention of Docker.

baobun · 2025-10-17T12:24:36 1760703876

It's indeed undermaintaned so it's not a case of only plug-and-play and automated pulls for production. Still a solid base to build from when setting up on VMs or dedicated and I'm yet to find something better short of DIYing everything.

slig · 2025-10-17T15:13:06 1760713986

Also, Pigsty [1]. Feels too bloated for my taste, but I'd love to hear any experience from fellow HNers.

[1] https://pigsty.io/

pwmtr · 2025-08-15T20:28:30 1755289710

If you are looking for Postgres on Hetzner, you may want to check out Ubicloud.

We host in various bare metal providers, including Hetzner. (I am the lead engineer building Ubicloud PostgreSQL, so if you have questions I can answer them)

pwmtr · 2025-06-02T08:04:08 1748851448

Yes, that is correct. That said, in our tests we only saw 2x improvements in CH benchmarks. However, we found out that it was due to an architectural issue in our VM I/O path and how we virtualize the storage. Based on our estimations we should see ~5x difference but for that we need to revamp our storage virtualization first.

We have plans for publishing a CH benchmark results on a follow up blog post. However, we didn't want to do that for now to not put misleading results.

pwmtr · 2025-05-16T15:17:54 1747408674

Really appreciated the authors persistence on keeping to use PostgreSQL. There are many specialized solutions out there, but at the end they usually lack PostgreSQL's verstatility and battle testedness.

pwmtr · 2025-04-14T00:04:15 1744589055

I read something similar on Yuval Harari's Homo Sapiens, where he suggests wheat domesticated humans not the other way around. An excerpt can be found here [1]. Whole essay is great but I especially liked this part:

> The word “domesticate” comes from the Latin domus, which means “house.” Who’s the one living in a house? Not the wheat. It’s the Sapiens.

[1] https://www.ynharari.com/topic/ecology/

kqr · 2025-04-14T04:39:23 1744605563

I don't remember the details of their arguments, but Graeber and Wengrow think this is a misleading image. IIRC one of their main thrusts was that over long periods of history, groups of humans have adopted and abandoned stationary agriculture at will, as conditions indicate.

I suppose that makes us as domesticated as e.g. lions or chimpanzees, which have been known to e.g. share food with humans ("work for them") in the wild but it's not their reason for existence.

dmwilcox · 2025-04-14T21:20:04 1744665604

I lent out my copy of the Dawn of Everything so I can't get exact quotes or pages but this reminded me of a point in the book (which I highly recommend) which I'll attempt to summarize:

Domestication of plants was "easy" when tested in a controlled setting selecting seeds carefully at a university. Estimated that wheat in the agricultural "revolution" (a much scoffed about term in the book) could have been domesticated in 200 years if purposeful. Instead agriculture took something like 3000 years to become dominant versus mixed food sources (mostly gathering, fishing and hunting, with some low-effort planting on riverbanks).

And yes to your point, the idea that there is some sort of progression in human societies is contradicted by the recent decades of evidence in archeology -- every arrangement you can imagine seems to have been tried (stationary+hunter/gather, nomadic farmer, alternating back and forth, shifts toward farming for hundreds of years and then back to fishing for thousands). Humans time on the earth has been much longer than our recorded history, with more variety and less boring than we usually assume.

Anyway I hope that inspires someone to pick up the book, it really is a good read.

floydnoel · 2025-04-15T02:11:34 1744683094

thanks for sharing, I will check out that book for sure!

coldtea · 2025-04-14T09:08:18 1744621698

>IIRC one of their main thrusts was that over long periods of history, groups of humans have adopted and abandoned stationary agriculture at will, as conditions indicate.

In general they still totally depend on it.

So this would be like saying dogs aren't domesticated, because some left their owners or bit them, or there are groups of stray dogs here and there.

kqr · 2025-04-14T11:59:01 1744631941

What makes you say they still totally depended on it? I can easily imagine groups of humans having a period of settled agriculture for convenience rather than necessity.

BobbyTables2 · 2025-04-14T03:07:24 1744600044

If you are claiming that a hot buttered dinner roll made from wheat can actually domesticate ME…

… then you’re damn right.

coldtea · 2025-04-14T09:07:05 1744621625

>The word “domesticate” comes from the Latin domus, which means “house.” Who’s the one living in a house? Not the wheat. It’s the Sapiens.

Etymology never made for very compelling arguments.

hollerith · 2025-04-14T09:08:19 1744621699

"Free" derives from an Indo-European word that means "one of the loved ones".

actionfromafar · 2025-04-14T09:43:34 1744623814

Something about this free-wheeling non sequitur, if you even can call it that, is incredibly hilarious to me!

mullingitover · 2025-04-14T02:49:21 1744598961

My theory is that multicellular life itself was developed because viruses wanted a more effective way to travel. Humans are the pinnacle of virus transportation technology, and they've developed very successful behavioral override countermeasures against our pesky use of vaccines.

ccozan · 2025-04-14T05:46:03 1744609563

Not only that, but they ( or some parts of it) have been incorporated in the other species, human included, DNA as well!

The code just wants to survive.

01HNNWZ0MV43FF · 2025-04-14T07:55:27 1744617327

"all things strive"

tabokie · 2025-04-14T02:18:32 1744597112

He also talked about this "reverse chain of command" in the recent talk at Peking university:

Human evolves from worm. Human brain is originally a bunch of neurons centered around the worm's mouth to search for food. It is natural to think human is still controlled by stomach to this day (or spinal cord for that matter).

baxtr · 2025-04-14T06:07:46 1744610866

Do you have a link?

tabokie · 2025-04-16T12:49:01 1744807741

https://www.bilibili.com/video/BV1YLdPYyEPX?t=1852.7

triceratops · 2025-04-14T15:24:03 1744644243

It's not up to wheat. Humans invented Latin and get to define domestication.

actionfromafar · 2025-04-14T15:55:32 1744646132

Whoever gets to Latin first defines it!

EpiMath · 2025-04-14T04:10:23 1744603823

Also Edgar Anderson's "Plants, Man, and Life" on a similar theme.

latortuga · 2025-04-14T03:56:32 1744602992

Love this book and immediately thought of the same section!

pwmtr · 2025-02-19T14:29:21 1739975361

At the time of our investigation, we found few articles supporting that power caps could potentially cause hardware degradation, though I don't have the exact sources at hand. I see the child comment shared one example, and after some searching, I found a few more sources [1], [2].

That said, I'm not an electronics engineer, so my understanding might not be entirely accurate. It’s possible that the degradation was caused by power fluctuations rather than the power cap itself, or perhaps another factor was at play.

[1] https://electronics.stackexchange.com/questions/65837/can-el... [2] https://superuser.com/questions/1202062/what-happens-when-ha...

immibis · 2025-02-19T16:34:22 1739982862

The power used by a computer isn't limited by giving it less voltage/current than it should have - if it was, the CPU would crash almost immediately. It's done by reducing the CPU's clock rate until the power it naturally consumes is less than the power limit.

pwmtr · 2025-02-19T13:10:33 1739970633

Author of the blog post here.

Yeah, this is generally a good practice. The silver lining is that our suffering helped uncover the underlying issue faster. :)

This isn’t part of the blog post, but we also considered getting the servers and keeping them idle, without actual customer workload, for about a month in the future. This would be more expensive, but it could help identify potential issues without impacting our users. In our case, the crashes started three weeks after we deployed our first AX162 server, so we need at least a month (or maybe even longer) as a buffer period.

ThePowerOfFuet · 2025-02-19T13:16:05 1739970965

>The silver lining is that our suffering helped uncover the underlying issue faster.

Did you actually uncover the true root cause? Or did they finally uncap the power consumption without telling you, just as they neither confirmed nor denied having limited it?

pwmtr · 2025-02-19T13:28:25 1739971705

The root cause was a problem with the motherboard, though the exact issue remains unknown to us. I suspect that a component on the motherboard may have been vulnerable to power limitations or fluctuations and that the newer-generation motherboards included additional protection against this. However, this is purely my speculation.

I don't believe they simply lifted a power cap (if there was one in the first place). I genuinely think the fix came after the motherboard replacements. We had 2 batches of motherboard replacements and after that, the issue disappeared.

If someone from Hetzner is here, maybe they can give extra information.

oz3d · 2025-02-19T13:40:37 1739972437

hetzner is currently replacing motherboards of their dedicated servers [1] But I dont know if thats the same issue that was mentioned in the article.

[1] https://status.hetzner.com/incident/7fae9cca-b38c-4154-8a27-...

ubanholzer · 2025-02-19T13:54:12 1739973252

Thats the same issue, yes.

axus · 2025-02-19T13:41:08 1739972468

Customers are the best QA. And they pay you too, instead of the reverse!

rat9988 · 2025-02-19T15:45:49 1739979949

I'm pretty sure they pay for QA. QA cannot always catch every possible bug.

knowitnone · 2025-02-19T16:30:25 1739982625

these crashes should have been caught easily

crishoj · 2025-02-19T23:06:13 1740006373

Were you able to identify the manufacturer and model/revision of the failing motherboards? This would be extremely helpful when shopping for seconds hand servers.

babuskov · 2025-02-20T09:06:54 1740042414

I cannot find the link now, but it was mentioned that it was ASRock mobos.

crishoj · 2025-02-20T18:32:38 1740076358

Thanks. This comment above does mention ASRock: https://news.ycombinator.com/item?id=43112594

On the other hand, dmidecode output in the article shows:

Manufacturer: Dell Inc. Product Name: 0H3K7P

pwmtr · on Sept 9, 2024

Definitely interesting material. I realized, especially in last few years, there is an increased interest on moving away from propriety clouds/PaaS to K8s or even to bare metal, primarily driven by high prices and also interest of having more control.

At Ubicloud, we are attacking the same problem, though from a different angle. We are building an open-source alternative to AWS. You can host it yourself or use our managed services (which are 3x-10x more affordable than comparable services). We already built some primitives such as VMs, PostgreSQL, private networking, load balancers and also working on K8s.

I have a question to HN crowd; which primitives are required to run your workloads? It seems the OP's list consists of Postgres, Redis, Elasticsearch, Secret Manager, Logging/Monitoring, Ingress and Service Mesh. I wonder if this is representative of typical requirements to run HN crowd's workloads.

evertheylen · on Sept 9, 2024

Quite simple, I want to submit a Docker image, and have it accept HTTP requests at a certain domain, with easy horizontal/vertical scaling. I'm sure your Elastic Compute product is nice but I don't want to set it up myself (let alone run k8s on it). Quite like fly.io.

PS: I like what you guys are doing, I'd subscribe to your mailing list if you had one! :)

pwmtr · on July 23, 2024

Sure you can, but Let's Encrypt, just like DigiCert, is a 3rd party provider and they don't guarantee that you would get a signed certificate in few minutes. If they have an outage, it could take hours to get a certificate and you wouldn't be able to provision any database servers during that time. In our previous gig at Microsoft, we had multiple DigiCert outages which blocked the provisionings.

wolfhumble · on July 24, 2024

I personally, anecdotally, haven't had any problems with this the last years, and it doesn't seem like this is a big issue based on the information from the incident forum posts: https://community.letsencrypt.org/c/incidents/16/l/top

Self signing probably causes quite a few other issues, even though you have more control of the process, doesn't it?

Thanks!

pwmtr · on July 24, 2024

I cannot comment on Let's Encrypt's reliability. Maybe I had just too many bad experiences from DigiCert outages and I'm bit pessimistic. However, their status page does not give much confidence https://letsencrypt.status.io/pages/history/55957a99e800baa4...

I think if you need to generate a certificate once in a while, using Let's Encrypt or DigiCert is OK. Even if they are down, you can wait for few hours. If you need to generate a certificate every few minutes, few hours of downtime means hundreds of failed provisionings. Hence, we opted for self-signing.

In terms of reliability, it is great, because we control everything. It is also quite fast; it takes few seconds to generate and sign a certificate. The biggest drawback is that you need to distribute the certificate for CA as well. Historically, this was fine, because you need to pass CA cert to PostgreSQL as a parameter anyway, so the additional friction for users that we introduced due to CA cert distribution was low. However with PG16, now there is an option sslrootcert=system, which automatically uses OS trusted CA roots certs. Now the alternative is much seamless and requires almost no action from user, which tilted the balance in favor of globally trusted CAs, but still it doesn't give me enough reason for the switch.

I have few ideas around simultaneously self signing a cert and also requesting certificate from Let's Encrypt. The database can start with the self signed certificate at the beginning and we can switch to Let's Encrypt certificate when it is ready. Maybe I'd implement something like that in the future.

wolfhumble · on July 24, 2024

Thanks for your detailed explanation!

pwmtr · on July 23, 2024

At another thread in this page, I wrote more about this, but in summary; we also like k8s-based managed Postgres solutions. They are quite useful if you are running Postgres for yourself. In managed Postgres services offered by hyperscalers or companies like Crunchy though, it is not used very commonly.

tehlike · on July 24, 2024

I always assumed crunchy was using their own operator for their managed offering. Is that not the case?

https://github.com/CrunchyData/postgres-operator

pwmtr · on July 24, 2024

They have a Ruby control plane; https://www.crunchydata.com/blog/crunchy-bridges-ruby-backen...

JohnMakin · on July 23, 2024

> it is not used very commonly.

Is this a problem of multi-tenancy in k8s specifically or something else?

pwmtr · on July 23, 2024

At k8s, isolation is at the container level, thus properly isolating (for security purposes) system calls is quite difficult. This wouldn't be a concern if you are running Postgres for yourself.

Also for us, one reason was operational simplicity. You can write a control plane for managed Postgres in 20K lines of code, including unit tests. This way, if anything breaks at scale, you can quickly figure out the issue without having to dive into dependencies.