Got up to the TL;DR paragraph. This was a major red flag given the initial presentation of the discovery of a bottleneck:
'''
When a NOTIFY query is issued during a transaction, it acquires a global lock on the entire database (ref) during the commit phase of the transaction, effectively serializing all commits.
'''
Am I missing something - this seems like something the original authors of the system should have done due diligence on before implementing a write heavy work load.
I think it's just difficult to predict how heavy is heavy enough to make this a problem. FWIW I had worked at a startup with a much more primitive data storage system where serialized commits were actually totally fine. The startup never outgrew that bottleneck.
If “doing due diligence” involves reading the source code of a database server to verify a design, I doubt many people writing such systems do due diligence.
The documentation doesn’t mention any caveats in this direction, and they had 3 periods of downtime in 4 days, so I don’t think it’s a given that testing would have hit this problem.
''' When a NOTIFY query is issued during a transaction, it acquires a global lock on the entire database (ref) during the commit phase of the transaction, effectively serializing all commits. '''
Am I missing something - this seems like something the original authors of the system should have done due diligence on before implementing a write heavy work load.