One cool thing about the way MVCC currently works in PostgreSQL is that transaction rollback is instantaneous - there is no undo work to perform if you chose to rollback your transaction.
This proposal seems like it would do away with that feature - but perhaps it's a good trade off in practice. I wonder what people here think.
> This system is very much symmetric. A transaction that commits generates pretty much the same amount of work as a transaction that aborts. This is very elegant, but it’s not best in practice, because very few people run workloads where anywhere close to 50% of transactions abort. (Anyone who has such a workload will, I suspect, find that PostgreSQL handily outperforms the competition.) It would be better to have a system where we try to make commits cheaper, and aborts more expensive.
I agree - I know that my organization would see an increase in performance, as 90%+ of our transactions are committed. I am excited to see this in action.
That is more or less how Oracle does commit and rollback: changed items create a version in a "rollback segment". Queries begun during a transaction's life will get the original version of the changed items there. One can run out of rollback space, and so cause a transaction to be aborted before it creates potential inconsistency, though I haven't seen that happen in years.
I can't comment on the legal stuff. But I don't think that Oracle was the only one to manage "read committed" consistency this way.
Almost every RBDMS uses undo logs to implement rollback (they physically roll back/undo the changes). It is the traditional implementation technique for a whole generation of databases -- see https://en.wikipedia.org/wiki/Algorithms_for_Recovery_and_Is... . AFAIK only PostresSQL and possibly also Interbase/Firebird and Rdb (though I'm not 100% sure about those) work in a different way without undo logs, preferring to leave old data in the heap, step over if as required, and eventually garbage collect it later. Ancestral POSTGRES did that because its goal was to keep all versions forever and do time travel, and VACUUM just moved them off to write-only media for archival. A subset of traditional undo-based RDBMSs can also use undo logs to implement MVCC. That's primarily because the other RDBMSs didn't choose to provide snapshot isolation: instead they made writers block readers, so readers never needed to fish old row versions out of the undo log except when actually rolling transactions back. The set of databases that can do undo-based MVCC currently includes at least Oracle, DB2 (since 9.7), MySQL (InnoDB + maybe other storage engines). Note that the original implementer of PostgreSQL's WAL (REDO) logging intended to add undo log support (possibly under the influence of the super well-known ARIES design): https://www.postgresql.org/docs/7.2/static/wal.html . Somehow it's taken another 15 years for someone to try that.
I'm pretty sure they looked at this before they started the work. There's a number of different approaches out there, all have their advantages and disadvantages. But I also found the reliance on VACUUM at Postgres more problematic than Oracle's or SQL Server's approach.
First version programmed in C (not assembler) was version 3 (1983). Row-level locking was introduced in version 6 (1988). Constraints, stored procedures and triggers - version 7 (1992).
Lots of people mentioning Oracle, but only one mentioning MySQL (which is famous for doing this; I wasn't even sure Oracle did ;P). This article hasn't showed this is a massive win... and frankly I doubt this is even that difficult to prototype. Before writing this article I would have loved to see them do some simple benchmark to show this was an interesting direction. As it stands I don't even see why this was published: they didn't even do a simple comparison to MySQL.
Like, this is how I read this article: "PostgreSQL does something one way, which everyone is used to, and it is clearly pretty stupid; we have an entirely new way of thinking about this that we are pretty sure--but not 100% sure as we aren't that arrogant--will be better than the way it is happening now: we will just do this, this, and this, which certainly seems better. more to come!".
The article I would have wanted to see: "it is well known in the world of database software that there are two ways of doing this one thing, and PostgreSQL is pretty unique in doing it this way which we know has some downsides, such as this, this, and this; of course, the other way also has known downsides, such as this, this, and this; we started to ask ourselves if it would still make sense to try that model for some workloads against PostgreSQL".
Essentially, a good version of this article would have at least mentioned why MVCC is interesting at all (as this article skirts dangerously close to "clearly we should just update everything in place"). It would have talked about the advantages of delaying write amplification vs. doing two writes immediately. It would have talked about Heap-Only-Tuples as a form of garbage collection on that write amplification. It would have talked about reasons why people would opt to have lower than 100% fillfactor on a table even if they weren't storing multiple tuple copies. It would have talked about snapshot serializability and predicate locking and how they are designed around MVCC and aaked of those still made sense if implemented using an undo log. It would have admitted that most other databases do it the other way, and asked "why didn't PostgreSQL"?
This article barely even went into the primary advnatage of the scheme they are describing, which is that if you have a million indexes on a table you don't have to update all of them. Seriously: the reason to care about this has nothing to do with the VACUUM cost as if you sit down with a pencil for a moment it should be obvious that if you have to do two writes now vs. one write now and one write later you have only lost flexibility in scheduling your maintenance. People get burned by VACUUM due to bad planning for later capacity needs when they least expect it, not due to the raw amplification. But the index issue? That is serious. They didn't talk about the index issue as the article was supposedly already too long, but... the article didn't really say anything.
"""
We at EnterpriseDB are intending to try implementing the MySQL InnoDB undo log technique in PostgreSQL to see if it offers meaningful benefits vs. having to do a later VACUUM as we currently have to do; wish us luck! For more information on undo logging, please read the following articles.
^ This entire article is just saying exactly this.
In the end, this article is frankly leading to more confusion than clarity :/. It has people here asking if PostgreSQL did it this way due to patents, and assuming that the simplistic idea that "if 50% of your statements commit then this is a win" is correct. The article even starts with a totally misleading concept for how much table bloat you are going to end up with: it isn't 50% after the first VACUUM (which you should be doing pretty quickly). I guess I don't even understand why this article was published today.
I wrote a previous post on why MVCC is interesting, and how it relates to the topic at hand. It's the first link in the article.
I don't think that it's accurate to say to say that there are only two ways of doing this. There are more than two, and there's another post by my colleague Amit Kapila which talks about that. That's the second link in the article.
We have in fact done benchmarks. We plan to publish them.
But you can't put everything into one article. Several people mentioned thinking this one was quite long, and it barely scratches the surface of the topic. If I'd included an in-depth discussion of all the topics you raise here, it would have been four or five times longer. To try to avoid that, yet give the context you want, I linked to previous posts which cover this topic, some of which were written explicitly to provide context for this article.
But I'm sorry you didn't like the article. I tried my best.
Does this affect `SERIALIZABLE` isolation level? It seems like it might not change anything because roughly the same number of pages get marked for predicate locks.
This proposal seems like it would do away with that feature - but perhaps it's a good trade off in practice. I wonder what people here think.