> PostgreSQL uses the term cluster to refer to a “cluster” of databases, as opposed to the usual notion of a group of servers or VMs working in a co-ordinated fashion.
Thank you for this. I didn't realise this and for the past few years have been thinking that I seem to be the only one that runs only a single database host as everyone else seems to have "clusters". I never realised this just means multiple databases.
I'm not an expert, but as far as I can tell, even a single database instance with a single database (or maybe no databases at all?) could constitute a "cluster".
Clustering is, besides all the other definitions you're seeing here, a table property. A clustered table is a table that has, on disk, been aligned with a certain index[1] - each table can only be clustered to a single index and it essentially means that row retrieval for that specific index is much more efficient, it basically gets you one covering index for free.
A Postgres cluster is, roughly speaking, a server instance. That is, if you run two copies of postgres on the same box (One on the default 5432 port, another on, say, 6543) and have each of those copies manage its own independent config, data, etc, then those instances are what Postgres calls a cluster.
I think most people surprised at the PostgreSQL model of databases and schemas are coming from a MySQL/MariaDB background where the terms are synonymous. PostgreSQL matches the model of basically every non-MySQL database when it comes to these concepts.
Though if we talk about database weirdness, I never liked Oracle DB's insistence that databases and users are the same thing. Glad I haven't used it for well over a decade now :)
I was actually surprised that you can query across databases in MySQL, but that makes sense once you understand that there's only one database and the "databases" are just schemas.
Coming from MSSQL I was equally surprised that you cant query across databases in postgres even if they are on the the same server. Yeah there is the Foreign Data Wrapper thing but the DBA was very reluctant to enable it or whatever.
> Normally, schema means "shape/structure of data". In Postgres it's roughly a container for logical databases.
No, it's a namespace within a database, not a container for databases; this is not Postgres specific, it is part of the SQL standard and widely used in other implementations.
Yeah, you caught me between my post and my edit. I misspoke and said "logical databases" instead of "tables". I'm sure that's not precisely correct either, however.
Ah I see. Postgres uses the word in the sense that it's used in the SQL spec, though, and it means pretty much what you're describing. The key difference is that I think you're talking about the human-readable file (which in the SQL world is written in SQL DDL), whereas the standard means it as closer to being the internal/runtime representation of that "shape/structure of data".
It's helpful to think of "the database" as the actual physical storage, and the schema is what the db uses to make sense of how to manipulate/query that data. From that perspective, the SQL DDL is scripting language to manipulate those schema objects ("objects" in the OOP sense, "schema objects" has an actual specific meaning in SQL).
I think people think schema means "a list of tables and indexes" but it's an actual object in postgres ("DROP SCHEMA public"). These do not diverge too far; that's what the object represents of course.
Most people think schema means "shape/structure of data", not "list of tables and indexes". In Postgres (or maybe SQL more broadly) it roughly means "a container for tables".
Not just tables, but everything else as well. The shape and structure of your data is always defined within a schema, and a single database may have more than one.
A PostgreSQL server can contain multiple databases; they are independent and you can't access data in one database while connected to another (without dblink or something similar)
As far as I know pretty much every database (and the SQL standard) except MySQL has schemas as an explicit database object and calls them that. What MySQL calls "databases" are actually schemas; they're just containers for database objects and you can query across them (and CREATE SCHEMA is an alias for CREATE DATABASE)
EDIT:
There's a fun trick you can do with multiple schemas that illustrate why they are schemas and not just "containers of things"
You have a "data" schema that contains your table definitions; your actual, real data and indices etc. go here. Only privileged users can access this schema directly.
Then you have an "interface" schema, that contains views and functions used by people; they can refer to the data schema, and with some clever view definitions, you can do it such that they can only access the data using the views and functions in your interface schema.
At some point, you could create an "interface_v2" schema that provides better (or more) methods for accessing your data that's backwards incompatible. Old applications can continue using the "interface" schema by setting their schema search path to "interface" (which would be the default), but new applications can "overlay" the schemas by setting their search path to "interface_v2,interface" and opt-in to new functionality. The "structure" of your data is changed simply by opting in to the new schema.
It's pretty rare for people to do this (they understand versioned web APIs better than versioned database APIs), but it's a thing you can do.
> The shape and structure of your data is always defined within a schema
Maybe we're already agreed on this, but for clarity my point is that the common notion of a schema is strictly "the shape of the data" and not "a container for the data, the shape of the data, and a bunch of other stuff". I agree that this latter definition is probably shared across many relational databases and not just Postgres.
> but for clarity my point is that the common notion of a schema is strictly "the shape of the data" and not "a container for the data, the shape of the data, and a bunch of other stuff".
Yeah, I see what you are saying I just disagree. Most people who know either use, in the context of RDBMSs, know both, and resolve the ambiguity by context. This is fairly normal, it's very common for words to have multiple common definitions.
In my experience, this is fairly "advanced" knowledge. Lots of people who grind out SQL queries all day as analysts or vanilla software engineers don't know the SQL sense of the term.
> Most people who know either use, in the context of RDBMSs, know both, and resolve the ambiguity by context. This is fairly normal, it's very common for words to have multiple common definitions.
The RDBMS domain alone doesn't suffice to resolve the ambiguity because "structure of data" and "container of tables/etc" are both relevant. I would definitely contend that application developers and operators (though perhaps not DBAs) need to talk about "structure of data" a lot more than I need to talk about "container of tables/etc".
Using schemas to effect private implementations and public interfaces was my favorite trick when I was a DBA. This was for a large company so we definitely needed fences like this.