If you gain some satisfaction from having to track version numbers, then good fo...

stephenr · on Dec 10, 2019

> If you gain some satisfaction from having to track version numbers, then good for you. I and most other people find it tedious.

I literally said there are no version numbers to track. If you find identifying the current date either manually or in some automated tool tedious, I don't know how to help you.

> If you or some DBA needs to make an emergency database change - suddenly the real DB doesn't match your versioning.

Well by that logic, if I suddenly need to make some "emergency" fix to the code, it won't match the version control system.

The solution there, is to have a method in place to rapidly deploy a change, not to make your migration tool also work around cowboy solutions.

Your previous comments also imply that to be usable, the "diff" system needs to be reviewed (i.e. to handle table renames, and to be considered safe for production).

So how does that handle the cowboy approach where the schema doesn't match? Either your "diffs" are generated at the time of execution, so the previous state is regardless but you can't review them, OR your diffs are generated ahead of time, so they can be reviewed, but will not necessarily work against the live database.

So which is it?

I'm not against a "make changes against the DB directly" workflow for development. I've written code that does exactly that, as you describe, from models. But it's not practical for production use.

It's usable as a development tool, and to produce static migration scripts.

djrobstep · on Dec 10, 2019

"They're version dates not version numbers" is hairsplitting. You still have external information that you must rely on to determine what state your production db is in, and that's bad.

If an on-call SRE calls me in the middle of the night asking me if he can add an index to solve a performance problem, I'd rather say "yes, no problem", not present a series of additional steps for them to jump through.

You review the script, and when the time comes to apply, recheck the database to make sure it's still the same state you generated the script against. Generally people tell you if they've made changes on live dbs that you're working on, but it's nice to double-check regardless.

stephenr · on Dec 10, 2019

> You still have external information that you must rely on to determine what state your production db is in, and that's bad

What external information? Whether each migration has been applied or not is stored in the database itself. The dates are literally used just to ensure correct ordering - that's literally their only purpose.

> not present a series of additional steps for them to jump through.

If someone can't write the change they want to make into a file, write the opposite action into another file, commit and push that change to a VCS repo, I don't think they should be given access to a god damn toaster oven much less your production database.

> You review the script, and when the time comes to apply, recheck the database to make sure it's still the same state you generated the script against.

.. How can that possibly work with automated deployments? And how on earth do you "recheck the database to make sure it's still the same", with any degree of certainty?

Your entire approach smells like a very manual process that doesn't work for teams any larger than 1 person.

djrobstep · on Dec 10, 2019

> Whether each migration has been applied or not is stored in the database itself.

You're dragging this further into pedantic territory here. A chain of scripts and a version table is external to the structure of the database itself.

> If someone can't write the change they want to make into a file, write the opposite action into another file, commit and push that change to a VCS repo...

The recurring theme here is that you have a preference for mandatory busywork instead of a direct approach. People putting out fires ought to be focused on what will directly solve the problem most quickly and safely. In larger environments with dedicated ops people supporting multiple applications/environments/databases, not every ops person is going to be familiar with your code and preferred workflow.

> How can that possibly work with automated deployments? And how on earth do you "recheck the database to make sure it's still the same", with any degree of certainty?

...with a diff tool.

> Your entire approach smells like a very manual process that doesn't work for teams any larger than 1 person.

The whole point is that it is automatic rather than manual. I've used it before in teams "larger than 1 person" and it has worked fine.

evanelias · on Dec 10, 2019

> Your entire approach smells like a very manual process that doesn't work for teams any larger than 1 person.

You may be misunderstanding the concept. Automated declarative schema management (AKA diff-based approach) has been successfully used company-wide by Facebook for nearly a decade, to manage schema changes for one of the largest relational database installations on the planet. It's also a widely used approach for MS SQL Server shops. It's not some newfangled untested crazy thing.

I have a post discussing the benefits here: https://www.skeema.io/blog/2019/01/18/declarative/

stephenr · on Dec 11, 2019

I understand the concept of a tool that changes the schema to match some declared state dynamically.

I wrote the same functionality into a library.

What I cannot comprehend is the poster who claims that such an approach can simultaneously:

- be automatically applied

- be reviewed and even edited after generation to handle e.g. renames

- handle previously unknown changes in the DB schema (aka handling cowboy behaviour from other ops).

All three are simply not possible at once.

djrobstep · on Dec 11, 2019

Here's how you can achieve all 3.

- Develop intended schema (I)

- Inspect production schema (P), save as P0

- Generate migration (M) by comparing to production (P0): I

- P0 = M

- Edit M as necessary, test for correctness, commit to master (meets your second criteria)

- Deploy code, with the migration running as a deploy step (meets your first criteria)

- Migration works as follows:

- Inspect P again, save as P1. If P0 != P1, abort process (this prevents any issues from out-of-band changes as per your third criteria, and means the pending script won't run more than once)

- Apply M.

- Inspect P once more save as P2. Double-check P2 == I as expected.

evanelias · on Dec 11, 2019

I disagree. This is definitely all possible at once with proper tooling.

Ideally this workflow is wrapped in a CI/CD pipeline. To request a schema change, you create a pull request which modifies the CREATE statement in a .sql file in the schema repo. CI then automatically generates and shows you what DDL it wants to translate this change to.

If that DDL matches your intention, merge the PR and the CD system will apply the change automatically. If that DDL doesn't match your intention, you can either modify the PR by adding more commits, or take some manual out-of-band action, or modify the generated DDL (if the system materializes it to a file prior to execution).

In any case, renames are basically the only general situation requiring such manual action. Personally I feel they're a FUD example, for reasons I've mentioned here: https://news.ycombinator.com/item?id=21758143