> What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.
> Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.
> Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.
> The only reason why this product is still surviving and still works is due to literally millions of tests!
Tests are great, but relying on them in this way is like relying on a net to catch you without wearing a harness. It's a good thing if your last line of defense is reliable enough to catch you. But if you're relying on it, it's not a last line of defense, it's the only one.
You should be able to work on software because you understand how it works and what the ramifications of a given change are. Tests and code reviews provide redundancy. But here, they aren't providing redundancy, they're bearing the load.
What provides redundancy if tests are missing, broken, or misinterpreted? Have you ever fixed a bug, gone to write a test for it - and found the test already exists but passed spuriously?
In that sort of codebase, the only thing scarier than changing a line of code and breaking thousands of tests, is changing a line of code and not breaking any tests.
I think it is not RDBMS, rather combinatorial explosion of configuration/flag/option/platform is insidious and we, software engineering as a field, don't know how to do it well.
I think it is one of the largest impact problem in software engineering if it can be improved. Maybe a way to restrict flag interaction and reduce support and test matrix as a result.
Something I enjoyed was listening to the talks by the LibreSSL team cleaning up the mess in OpenSSL that (in part) caused the Heartbleed bug.
One of their strategies was to drop the macro soup and simply program against the "libc we would like to have", and then add compatibility shims to materialise their ideal libc instead of conditional compilation at the point of use.
I suspect that an even bigger cause of the brittleness described in TFA is also that an RDBMS inherently has to deal with concurrency. And not in the way that most applications do - the RDBMS is where other applications push their hairy concurrency problems into.
The most enlightening part of the article was in the comments where it was observed that given the length of time Postgres has been going, the number of talented developers who have worked on the project there are no easy enhancements or fixes left, just hard knotty problems. (paraphrasing).
It's almost a mark of success of the project. There is obviously a lot of dedication too.
- Postgres documentation is one of the well maintained database documentations. This also means that developers, committers ensure changes to documentations for every relevant patch.
- talk about bugs in postgres compared to MySQl or Oracle or etc databases. Nugs are comparatively lesser or generally rare even if you are supporting postgres services as a vendor with lots of customer. the reason is the efforts involved by a strong team of developers in not accepting anything and everything, there are strict best practices, reviews, discussions, tests, and a lot more that makes it difficult to pass to a release.
- ultimately, more easy is the acceptance of a patch, more the number of bugs.
I love Postgres the way it is today and it still is the dbms of the year and developers most loved database.
I wish we have more Contributors committers, developers and also users and companies supporting Postgres so that the time to push a feature gets more faster and reasonable easier with more support.
Coming at this from a naive outsider perspective, the central problem described in the post (commits to PostgreSQL frequently have serious defects which must be addressed in follow-up commits) seems like one that would ideally be addressed with automated testing and CI tooling. What kind of testing does the Postgres project have? Are there tests which must pass before a commit can be integrated in the main branch? Are there tests that are only run nightly? Is most core functionality covered by quick-running unit tests, or are there significant pieces which can only be tested by hours-long integration tests? How expensive is it, in machine-hours, to run the full test suite, and how often is this done? What kinds of requirements are in place for including new tests with new code?
I would also note that the fix prs started landing the day after the initial commit, and other issues noted had fixes within three weeks. And of course postgresql has testing, but at universal distribution and use cases on things that will test both scheduler, network, fs, io drivers (Linux kernel, postgresql, etc, among others), some things need wider audiences or more extreme testing scenarios (SQLite for a strict subset of those considerations), and project health is measured by responding to that in a timely fashion. Afaics this is all about trunk/main, versus releases as well. So while its labeled its hard on the post (from a long time pg contributor), and yeah i might agree (cause maintainer on other software, so yeah all this resonates heavily), I’d also say its an example of things done right.
Seems like a reason to celebrate the open source model, and specifically here on how to do things better. Not to detract from universal issues for any project on maintainer availability. But, imagine a non oss database vendor with that degree of transparency or velocity, i can’t think of any that are doing anything close unless they got popped on a remote cve, aka prioritized above features or politics on a corporate dev sprint. Aka all software has bugs, it’s about how fast things are fixed, and in the context of oss imho fostering evolution among a diverse set of maintainers and use cases seems to be a better way.
As another example of that, ‘twas a PostgreSQL hacker at MS, that prevented Libxz from going wide because of caring due to perf regression and doing the analysis.
Most database companies run only a small amount of tests before committing. After committing, you run tests for thousands of hours. It sucks. You probably do this all day every day. You just run the tests on whatever you have currently committed. you kind of have to be careful about not adding more tests that make it take much much longer.
See https://news.ycombinator.com/item?id=18442941
Ahh, thanks, that piece of information suddenly makes TFA makes sense. I was wondering how it could be that those issues were not caught by unit tests before committing/merging, but seemed to be caught soon afterwards in a way that they could still immediately be ascribed to a specific commit.
What's missing in this post is a deep analysis of what the bugs are and what was causing them, in a 5 times why sense. Especially if they all seem dumb stuff at first.
There are some deep lessons about programming in this Factorio Friday Facts:
I'm pretty sure the answer is usually "concurrency". The examples alluded to in TFA sound like it. Handling concurrency is notably extremely hard, and an RDBMS is what other applications use to solve their hairy concurrency problems so they don't have to.
I remember, waaaaay back in the early days of PostgeSQL, I was using it for project and it crashed in a way that corrupted our data. There was no hardware problem; it was just a database crash. (This was quite some time ago. I don't recall what year it was but I'm 68 now and have been a programmer since my senior year in college.) I switched to MySQL for that project. I assume PostgreSQL is not remotely prone to anything like that anymore!
It can still happen sometimes, but it's very rare that it crashes and even more rare that it corrupts as well. The good news, the chances of it corrupting your data silently was basically zero, then and now.
However back then MySQL seemed like it went out of it's way to corrupt your data. The only "bonus" is, it did it all silently, so nobody ever noticed until they went looking. With MariaDB(the successor) it's pretty rare that it silently corrupts your data these days.
I believe a lot of that can be tied back to the default switching from myisam to innodb. crashes frequently resulted in corrupt tables with myisam. innodb is very good at recovering.
Around early 00s, it was already common wisdom among the web devs that, when it comes to free RDBMS, if you want speed, you use MySQL; and if you want consistency and reliability, then Postgres is where it's at.
I was interested in learning how to created a postgres extension the other week. Not just bundling some SQL scripts, but a proper extension tying into their API.
Trying to find any good information on how to go about this proved super difficult. Well, I wasn't having much luck and just gave up.
I've created a few extensions in C. I found the "easy" path was looking at other extensions. The docs helped but bootstrapping my internals knowledge took a lot of time.
There are some striking similarities to working on another large OSS codebase: Mozilla. (I am employed there.) We have struggled with all of these things for years, and we have a much larger pool of committers (and thus much higher variance in committer abilities). Things today are much better than they used to be, even if all of the same problems are still present to some degree.
Some of what we did might translate well to PostgreSQL, some of it won't, and much of it is probably too expensive and/or too much work. (Then again, it's work that doesn't require an inflight rocket surgeon to accomplish, which means it's doable by a much larger population of developers.)
- we've long had volunteer (and later, employee) "sheriffs" that monitor CI, know how to back things out, and over time get better at recognizing the sorts of problems that come up.
- For slow or expensive tests that don't run on every commit, they'll also take care of "backfilling" test jobs to narrow down which patch or patch stack most likely caused a problem.
- As with most CI systems, there's a staging area that gets a decent level of testing before changes are merged into the main development line.
- Feature gates for larger changes, so things can land in the mainline and be worked on there for a while, with CI regression tests running both with the feature enabled and disabled (as well as feature-specific tests when it's enabled). Good for reducing bit rot.
- Extensive fuzz testing. This would probably need to be specialized to a DB environment, since they're obviously very stateful. Various forms of snapshotting are good. For the browser (and especially the JavaScript engine I work on), it's hard to overstate just how useful this is. I would guess it could work quite well for a DB engine too.
- Lots of resources poured into test machines. With enough machines, good sheriffs, and a rich test suite, test latency doesn't matter all that much. You may not know about the problems for a day or three, but if you can depend on either getting backed out or your feature re-disabled, then you can fire and forget with no guilt. (Ok, the sheriffs will start getting snippy if you bounce a landing too many times, as is their prerogative.)
I'm guessing DB development and testing has tons of idiosyncratic difficulties, but it all sounds so familiar that I think many of the same approaches could work. The inevitable "turning the buildfarm red" should not lead to "spend[ing] the afternoon, or the evening, fixing it..." Complex software is a different beast, and it's unrealistic to expect to be able to break all features down into simple obvious changes. There's just too much going on.
(You still can't handle just anyone committing just anything at any time, though. There will always be a rate of breakage introduction that your system can handle, and it's not hard to go over it.)
Wouldn't the testing basically consist of executing large amounts of SQL against an initially empty database, and then executing more SQL to read the current state of the data and verifying it is like it should be?
Such tests could perhaps be database-agnostic to a degree, verifying that the database behaves according to the SQL-standard?
Those are the easiest tests to write, and probably pretty unlikely to find anything unless the feature you're adding is exposed via SQL pretty directly.
I was thinking more like lots of concurrent operations, and backups/restores (again concurrent with other DB traffic), and replication, and incremental operations, and failover, and error handling in general. All while varying things that feed into scheduling, etc. Anything nondeterministic is good, though that's not all of it. (Annoyingly, that means that failures will quite often be intermittent, which is a whole can of worms of its own.)
I laughed at that, too, but the commenter, Greg Smith (disclosure: a former colleague) has been involved with Postgres for ages, and concludes that Rust would not actually be a magic bullet here.
Not a magic bullet, but there's hell a lot of difference between maintaining C and Rust code.
In Rust it's much easier to create robust and performant abstractions that are near impossible to misuse, eliminating tons of potential bugs right away.
In Rust you don't need 10 years of experience and staring with a microscope and at every line to ensure it doesn't introduce issues. In Rust trivial changes and indeed trivial, so you have more time to think about actually difficult parts.
In Rust contributors don't need to learn custom implementation for collections, string and other basic primitives on every project.
And most of all, people actually want to learn and work with Rust, so your contributor pool is expanding, not shrinking.
Sorry for the confusion—I know it's weird but the alternative turns out to be even more confusing and we've never figured out how to square that circle!
And what does „hard“ even mean? Hard for whom? Hard for the average person? For the average developer? Hard for an expert on a particular topic? Hard for someone having practiced a lot or not?
I don't think that's fair. The PostgreSQL codebase includes a lot of stuff that isn't included in SQLite, where it's covered by 3rd party projects - I'd bet their quality is nothing like SQLite or PostgreSQL. Tom Lane was involved in these commits and sounds like some of this got by him. His comments have frequently been described as "complete technical manuals", so I think that speaks to the complexity.
https://news.ycombinator.com/item?id=18442941
To quote part of it
> Oracle Database 12.2.
> It is close to 25 million lines of C code.
> What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.
> Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.
> Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.
> The only reason why this product is still surviving and still works is due to literally millions of tests!