> This wasn’t caught by the existing test suite (even though it runs almost 200 end-to-end tests), because it always starts from an empty database, applies all migrations and only then runs the test code.
Isn't that where the test coverage has a hole?
I somehow expected the blog post to extend testing for this. A pre-populated database which is then migrated. That seems to catch a wider class of issues than parsing sql and shielding against just checking for non-null without default.
Just to clarify a bit, the test ofc isn't a fully general solution to solving issues with database migrations (I hinted what that might be in the blog post), although it's still useful to provide a nice error message even if a more general solution was implemented.
That was not at all the goal of the post. I just wanted to appreciate how easy it was to achieve this specific task in Rust. In any other systems programming language that I used (even most other languages, except maybe for Python), I would never even imagine something like this being feasible, and so easy to do. That's it :)
That didn't even occur to me, tbh :) But it doesn't have to be SQL linting, I just wanted to appreciate the mindset of not being lazy/afraid to write an unorthodox test.
just look at this test for this one-liner DDL. the test is way more complex than thing it tests in first place. such confusing and hard to write and read tests is the reason people avoid writing tests in the first place. make tests great again!
(great = simple, short, easy to write, read, maintain. at very least no more complex than the thing it testing!)
Totally with you on the merits of simplicity, writability, readability. But I'm getting strong "rest of the owl" vibes. How would you have prevented this defect instead, in a way that you find simple?
> How would you have prevented this defect instead, in a way that you find simple?
I currently prevent this sort of thing using a populated database. This database has actual real data (user info overwritten with random data).
Using a pre-populated database catches many more errors than this article's approach does.
Using a pre-populated database that has actual data generated by the users over a year or so catches even more errors.
This approach is fragile and misses the actual error, fixing only the symptom. The actual error is "there's a hole in my tests big enough to fly a passenger jet through".
The correct approach is to take a dump of the production database, scrub PII from it, and then perform your migration.
> Apart from parsing the SQL query, I also considered an alternative testing approach that I might implement in the future: go through each migration one by one, and insert some dummy data into the database before applying it, to make sure that we test each migration being applied on a non-empty database. The data would either have to be generated automatically based on the current database schema, or we could commit some example DB dataset together with each migration, to make sure that we have some representative data sample available.
Suggests that this may also be a fairly complicated direction. Although it's not entirely clear to me why he can't just put one record in the db before any migrations, and then pull it all through. Plus it has the added drawback of removing your coverage for the (not unimportant) zero case.
It takes more than a few minutes, yes. But this one-time investment will prevent the next 10 migration-related bugs that he'll otherwise blog about.
Grab some representative data from production and keep feeding that into your migration tests. Keep updating those. Worth each minute if you care about quality.
If I only insert data into the DB once, I could miss important states. Like, I could add non-NULL data to a NOT NULL column, then make it NULL, and then make it NOT NULL again. If I don't insert NULL into the column in-between the last two migrations, I won't trigger the issue.
In my previous job, we implemented something similar to Neon branching. Each MR would start by cloning the "public" schema into a schema scoped to the merge request, and only then run the migrations and the integration tests.
There's a range of errors you just won't catch until you run your code against the real thing. Especially if you write SQL directly, which feels like a lost art even amongst experienced developers.
This is the wrong approach. He should be putting data in the database schema before trying to run migrations on it.
Doing that is simple and can potentially catch all sorts of bugs, including the bugs you didn't think of yet. His solution is complicated but only catches one very specific type of bug.
The author mentions that approach at the end of the post:
> Apart from parsing the SQL query, I also considered an alternative testing approach that I might implement in the future: go through each migration one by one, and insert some dummy data into the database before applying it, to make sure that we test each migration being applied on a non-empty database. The data would either have to be generated automatically based on the current database schema, or we could commit some example DB dataset together with each migration, to make sure that we have some representative data sample available.
That would certainly catch nearly all migration issues, but it doesn't provide a helpful error message like a test for a specific mistake like this does.
Ideally both approaches would be used, with the general case being used to detect and inform more targeted tests.
You should solve the problem you need to solve, not the problem you have a cool solution for.
The problem to solve is "given my database in an arbitrary but valid state, applying a migration should succeed and leave the database in a equally valid state". Not "how do I stop people adding NOT NULL columns into tables".
The author had an off-the-shelf crate they could trivially add to their project & prevent instances of this specific failure in 5 minutes even though solving the general case is possible but probably significantly harder & probably still contains various corner cases that are hard to completely eliminate.
I'd characterize it less as the cool solution and more as the quick & dirty solution that gets you a lot of value for little effort.
You're just over-generalizing the problem into something you will not be able to solve by proving that a migration works for a _specific_ arbitrary state.
Of course, but then let's not knock the OP article for covering a different subset of testable states, in a way that will need no adjusting for future schema changes, while still preventing the actual problem.
Does it need to be done at each step, though? Couldn't the test data be added at just one point (where the schema is known) and just let it run through all the subsequent migrations to see if it goes boom?
I have also found this kind of approach very valuable. With defensive code, where the code itself does a bit of the kind of correctness checking that one might conventionally put into tests, it works even better. Pre/post- conditions in functions for example.
Not if you could have data that could only have been inserted at some stage to behind with and then a subsequent migration only breaks that case (eg create table in step 10 and step 11 breaks that new table - if you only had data that you inserted in step 1, it would still pass all migrations. In other words, you need to have sample data that exercises the entire DDL that is added or you risk missing something.
Sounds like this might make for an excellent general SQL lint tool. I thought sqlint[0] was this, but it’s more for syntax than semantics, it looks like.
This is a whole ̶l̶o̶n̶g̶ blog post about someone who was surprised that he could use a crate to parse SQL queries in Rust. A bit of an underwhelming read to me personally.
If someone figured out how to do a neat thing, learned some tricks along the way and wrote about it just in case it was useful to someone else - that person is doing useful work. More people should do that.
There's a difference between intellectually "knowing" that you can parse SQL, and actually going out and trying it and finding that it's practical to do in a test suite. I found the article valuable.
And that's great. Note that I specifically said that "for me personally", I found that article less valuable. But I'm certainly not claiming that everyone else should feel the same.
As a matter of fact, I'm sure a lot of people find my comment to be of little value.
But sometimes some innocent blog posts get criticised as if they claimed that they solved world hunger. They don't. They are often just some random thought in a rarely-read blog. Nobody intended to go convince a gang of seasoned hackers that they have undisputable wisdom.
Serverless databases with branching support like Neon make this kind of thing trivial to do. You can just have a test that branches off your prod DB, runs the migrations, and then deletes the branch. This is lightweight enough to easily run on every pull request change.
No mocking or anything. This tests that your migrations are safe to run against real data.
I'm obviously biased by being an employee, but this is where Neon's branching[0] functionality can come in useful. We hope to expand on it one day and build more first-party migration tooling but you can already get a good enough system with the features we have.
Neon Branches are zero-copy snapshots of the database, with all the same data, on an isolated postgres instance. You can run migrations on that data without risking any modifications to data or performance in production. You can set up scripts to run it in CI[1].
This isn't perfect. If performance matters, some migrations might hide table locks which can cause major slowdowns. I'm not sure how you might detect this currently, I had some discussions recently about whether we can add "explain analyze" to DDL queries in postgres.
> If performance matters, some migrations might hide table locks which can cause major slowdowns.
Do you mean that the migration might work on a side branch, but then it might not work on the main branch, because there's no other activity running on the side branch?
Enjoyed the article.
Am I out of touch or have the article linked to a PR of the rust-lang eco system that didn’t go through a CR, is this really the standard for such a large language standard library?
If you look at https://github.com/rust-lang/bors it's not a standard library package, it's a tool to support the Rust development process. And the person who opened and landed that PR is the lead developer of that project.
Skipping a code review from someone else feels OK to me for that.
And now you've pulled in a full sql parser as a dependency (admittedly a dev/build time dependency, but a dependency nonetheless) in a project that has no business parsing sql.
In this day and age of increasingly rampant supply chain attacks & dependency vulnerabilities, I'd definitely be second guessing the approach of "just write a test for it" if that test involved blowing up your attack/vuln surface
Sidenote : i found this code quite readable. I'm usually reading Rust code here on HN that's full of weird lifetime annotations or Dyn or super long generic types. As a non-rust dev this freaked me out.
If you're running something like this and it isn't performance sensitive, you don't really need to deal with lifetimes at all. Lifetimes mostly come into play if you want to work with zero-copy parsing, or structs that borrow some things and act like a sub-view into the data you're working with.
I'm sure there are plenty of other use cases for lifetimes, but they don't come up very often when writing standard "application" code.
If you're writing applications or tests, it's mostly the simpler kind of code. If you're writing reusable (or perf. critical code), you will start seeing generics and lifetimes much more often.
Isn't that where the test coverage has a hole?
I somehow expected the blog post to extend testing for this. A pre-populated database which is then migrated. That seems to catch a wider class of issues than parsing sql and shielding against just checking for non-null without default.