> I do not understand why people aren't clamouring against postgres's connection...

dmw_ng · on Oct 27, 2020

There is a much better reference somewhere (possibly from Ingres times, or later), but here is Stonebraker describing how PostgreSQL ended up with connection-per-process in 1986:

> DBMS code must run as a sparate process from the application programs that access the database in order to provide data protection. The process structure can use one DBMS process per application program (i.e., a process-per-user model [STON81]) or one DBMS process for all application programs (i.e., a server model). The server model has many performance benefits (e.g., sharing of open file descriptors and buffers and optimized task switching and message send- ing overhead) in a large machine environment in which high performance is critical. However, this approach requires that a fairly complete special-purpose operating system be built. In constrast, the process-per-user model is simpler to implement but will not perform as well on most conventional operating systems. We decided after much soul searching to implement POSTGRES using a process-per-user model architecture because of our limited programming resources. POSTGRES is an ambitious undertaking and we believe the additional complexity introduced by the server architecture was not worth the additional risk of not getting the system running. Our current plan then is to implement POSTGRES as a process-per-user model on Unix 4.3 BSD.

(THE DESIGN OF POSTGRES, https://dsf.berkeley.edu/papers/ERL-M85-95.pdf )

There is another reference directly related to Postgres or PostgreSQL that made it even more clear, I expect it was probably later on. In effect it indicated someone involved in the project had strong intentions of getting to adding threading "real soon now". I'll update the comment if I figure out where that's from.

Threads were still a research thing by the mid 80s, so its absence from such an old design is easy to understand. Pthreads wasn't even standardized until 1996, although several unices (e.g. SunOS) already had popular implementations long before that.

anarazel · on Oct 27, 2020

You might be thinking of

> In POSTGRES they are run as subprocesses managed by the POSTMASTER. A last aspect of our design concerns the operating system process structure. Currently, POSTGRES runs as one process for each active user. This was done as an expedient to get a system operational as quickly as possible. We plan on converting POSTGRES to use lightweight processes available in the operating systems we are using. These include PRESTO for the Sequent Symmetry and threads in Version 4 of Sun/OS.

From: The implementation of POSTGRES - Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama

Hat tip to Thomas Munro. I think he pointed this quote out to me in the past.

Diggsey · on Oct 27, 2020

Thanks for doing this work! Postgres is amazing and this area is the one place where eg. MySQL has a clear lead.

> Postgres is an open source project. It's useful in a lot of cases. It's not in some others - partially due to non-fundamental limitations.

For me at least, postgres has the problem that it's too useful. It has so many clear advantages over the alternatives that it often makes sense to choose postgres over more specialized tools.

For example, even if I was primarily storing JSON data, postgres is still a good choice, because it offers better consistency guarantees than most document databases, and has a more powerful query model.

The end result is that the few limitations remaining (eg. the connection model) are felt all the more strongly: I cannot simply say "ah I need lots of connections, I'll use MySQL here" without also giving up on all sorts of other features that I would normally use. (Although MySQL is still improving)

thdxr · on Oct 27, 2020

I wasn't criticizing Postgres as much as I was a bit confused why I don't hear more people talking about this issue.

I think poolers confuse the issue because they solve the problem of allowing multiple processes to share the same pool, they don't increase throughput to the server.

The 5000 connection example was probably missing some context. I actually was using 10mb per connection as the overhead which creates a decently expensive server for not a lot of throughput.

By pipelining what I mean is allowing multiple requests to be in flight on the connection, each tagged with an ID that gets queued on the server. When a worker process gets around to processing it, it'll send back a response with the same ID. This increases driver complexity but also makes it so you need very few connections per client.

josteink · on Oct 27, 2020

> Could you expand on what you mean here?

Not sure what OP was thinking about, but for instance SQL Server supports something called “multiple active result sets” (MARS) on a single connection.

https://docs.microsoft.com/en-us/sql/relational-databases/na...