Wait until you hear what "cluster" means in Postgres-speak.

legulere · on March 10, 2022

The same as in Microsoft SQL Server?

https://docs.microsoft.com/en-us/sql/relational-databases/in...

SigmundA · on March 11, 2022

No unfortunately PG does not have clustered indexes much to my dismay.

throwaway894345 · on March 10, 2022

No doubt lots of relational database software share this quirk.

jelder · on March 10, 2022

Please go on.

throwaway894345 · on March 10, 2022

> PostgreSQL uses the term cluster to refer to a “cluster” of databases, as opposed to the usual notion of a group of servers or VMs working in a co-ordinated fashion.

https://www.opsdash.com/blog/postgresql-cluster.html

Note that the "databases" above are logical databases, not database hosts.

snthpy · on March 10, 2022

Thank you for this. I didn't realise this and for the past few years have been thinking that I seem to be the only one that runs only a single database host as everyone else seems to have "clusters". I never realised this just means multiple databases.

pcthrowaway · on March 10, 2022

So according to this usage, one database instance with a few databases would be a cluster?

throwaway894345 · on March 10, 2022

I'm not an expert, but as far as I can tell, even a single database instance with a single database (or maybe no databases at all?) could constitute a "cluster".

munk-a · on March 10, 2022

Clustering is, besides all the other definitions you're seeing here, a table property. A clustered table is a table that has, on disk, been aligned with a certain index[1] - each table can only be clustered to a single index and it essentially means that row retrieval for that specific index is much more efficient, it basically gets you one covering index for free.

1. https://www.postgresql.org/docs/current/sql-cluster.html

pdpi · on March 10, 2022

A Postgres cluster is, roughly speaking, a server instance. That is, if you run two copies of postgres on the same box (One on the default 5432 port, another on, say, 6543) and have each of those copies manage its own independent config, data, etc, then those instances are what Postgres calls a cluster.

munk-a · on March 10, 2022

Or schema!

sitharus · on March 10, 2022

The PostgreSQL usage of schema is consistent with the ANSI sql definition. MSSQL is the same.

chungy · on March 10, 2022

I think most people surprised at the PostgreSQL model of databases and schemas are coming from a MySQL/MariaDB background where the terms are synonymous. PostgreSQL matches the model of basically every non-MySQL database when it comes to these concepts.

Though if we talk about database weirdness, I never liked Oracle DB's insistence that databases and users are the same thing. Glad I haven't used it for well over a decade now :)

chousuke · on March 10, 2022

I was actually surprised that you can query across databases in MySQL, but that makes sense once you understand that there's only one database and the "databases" are just schemas.

progre · on March 10, 2022

Coming from MSSQL I was equally surprised that you cant query across databases in postgres even if they are on the the same server. Yeah there is the Foreign Data Wrapper thing but the DBA was very reluctant to enable it or whatever.

pdpi · on March 10, 2022

Hmm, care to expand? The word schema, in Postgres terms, seems to mean exactly what I expect it to.

throwaway894345 · on March 10, 2022

Normally, schema means "shape/structure of data". In Postgres it's roughly a container for tables.

dragonwriter · on March 10, 2022

> Normally, schema means "shape/structure of data". In Postgres it's roughly a container for logical databases.

No, it's a namespace within a database, not a container for databases; this is not Postgres specific, it is part of the SQL standard and widely used in other implementations.

throwaway894345 · on March 10, 2022

Yeah, you caught me between my post and my edit. I misspoke and said "logical databases" instead of "tables". I'm sure that's not precisely correct either, however.

atwebb · on March 10, 2022

It also seems under-used for how core it is (ex. everything in public/dbo).

pdpi · on March 10, 2022

Ah I see. Postgres uses the word in the sense that it's used in the SQL spec, though, and it means pretty much what you're describing. The key difference is that I think you're talking about the human-readable file (which in the SQL world is written in SQL DDL), whereas the standard means it as closer to being the internal/runtime representation of that "shape/structure of data".

It's helpful to think of "the database" as the actual physical storage, and the schema is what the db uses to make sense of how to manipulate/query that data. From that perspective, the SQL DDL is scripting language to manipulate those schema objects ("objects" in the OOP sense, "schema objects" has an actual specific meaning in SQL).

jrockway · on March 10, 2022

I think people think schema means "a list of tables and indexes" but it's an actual object in postgres ("DROP SCHEMA public"). These do not diverge too far; that's what the object represents of course.

throwaway894345 · on March 10, 2022

Most people think schema means "shape/structure of data", not "list of tables and indexes". In Postgres (or maybe SQL more broadly) it roughly means "a container for tables".

chousuke · on March 10, 2022

Not just tables, but everything else as well. The shape and structure of your data is always defined within a schema, and a single database may have more than one.

A PostgreSQL server can contain multiple databases; they are independent and you can't access data in one database while connected to another (without dblink or something similar)

As far as I know pretty much every database (and the SQL standard) except MySQL has schemas as an explicit database object and calls them that. What MySQL calls "databases" are actually schemas; they're just containers for database objects and you can query across them (and CREATE SCHEMA is an alias for CREATE DATABASE)

EDIT:

There's a fun trick you can do with multiple schemas that illustrate why they are schemas and not just "containers of things"

You have a "data" schema that contains your table definitions; your actual, real data and indices etc. go here. Only privileged users can access this schema directly.

Then you have an "interface" schema, that contains views and functions used by people; they can refer to the data schema, and with some clever view definitions, you can do it such that they can only access the data using the views and functions in your interface schema.

At some point, you could create an "interface_v2" schema that provides better (or more) methods for accessing your data that's backwards incompatible. Old applications can continue using the "interface" schema by setting their schema search path to "interface" (which would be the default), but new applications can "overlay" the schemas by setting their search path to "interface_v2,interface" and opt-in to new functionality. The "structure" of your data is changed simply by opting in to the new schema.

It's pretty rare for people to do this (they understand versioned web APIs better than versioned database APIs), but it's a thing you can do.

throwaway894345 · on March 10, 2022

> The shape and structure of your data is always defined within a schema

Maybe we're already agreed on this, but for clarity my point is that the common notion of a schema is strictly "the shape of the data" and not "a container for the data, the shape of the data, and a bunch of other stuff". I agree that this latter definition is probably shared across many relational databases and not just Postgres.

dragonwriter · on March 10, 2022

> but for clarity my point is that the common notion of a schema is strictly "the shape of the data" and not "a container for the data, the shape of the data, and a bunch of other stuff".

Yeah, I see what you are saying I just disagree. Most people who know either use, in the context of RDBMSs, know both, and resolve the ambiguity by context. This is fairly normal, it's very common for words to have multiple common definitions.

throwaway894345 · on March 10, 2022

In my experience, this is fairly "advanced" knowledge. Lots of people who grind out SQL queries all day as analysts or vanilla software engineers don't know the SQL sense of the term.

> Most people who know either use, in the context of RDBMSs, know both, and resolve the ambiguity by context. This is fairly normal, it's very common for words to have multiple common definitions.

The RDBMS domain alone doesn't suffice to resolve the ambiguity because "structure of data" and "container of tables/etc" are both relevant. I would definitely contend that application developers and operators (though perhaps not DBAs) need to talk about "structure of data" a lot more than I need to talk about "container of tables/etc".

spacemanmatt · on March 11, 2022

Using schemas to effect private implementations and public interfaces was my favorite trick when I was a DBA. This was for a large company so we definitely needed fences like this.