Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Implementing system-versioned tables in Postgres (hypirion.com)
161 points by ben_s on Feb 7, 2024 | hide | past | favorite | 92 comments


> Now, there are extensions out there that implement versioning – temporal_tables being the most popular I think – but none are supported for managed Postgres instances on e.g. Azure or AWS. This means that if we want system-versioned tables, we’re forced to make it ourselves.

This really provokes a wild geek rage inside me. Not that it is untrue, but that this is the state of things. It's that cloud providers are now often seen as the only responsible way to operate infrastructure, and in the process we hand over so much control of our tooling. We let ourselves be convinced that simplicity was worth a 10x premium, and admitted we were not grown up enough to handle the tools we had made for ourselves.

Sure, I know it is more complex than that. But I also feel sad about it.


>>> It's that cloud providers are now often seen as the only responsible way to operate infrastructure

23 billion.

Thats the profit for AWS in 2023.

23 billon that AWS users pissed away.

AWS is NOT cheap. I have quite a few clients who, could save lots of money dumping their cloud provider and setting up their own racks, hiring SA's. I have others who, if they optimized any of their stack could cut their AWS bills by half.

Having survived the 2000 bubble the lessons are going to have to be re-learned. Everything has cost and value and managing that is going to become an engineering task in the near future.


> 23 billon that AWS users pissed away.

While I agree that running your own hardware can be _seriously_ cheaper, this is like saying that renting is a waste of money and everyone should purchase a home.

Capacity planning isn't easy for a lot of small companies. Neither is the engineering time to set up, manage, and troubleshoot the hardware. At a certain scale, these are no brainers to take into your own hands, but for a 15 person engineering team that's completely focused on the product, it would be pretty expensive in human cost to set up and manage your own hardware. Sometimes, it can be set-and-forget, but rarely will it be as set-and-forget as a cloud provider.

Is it more expensive overall? Yes. Is it "pissed away" money? In some cases, maybe. In many, no. The ease of running and reduced upfront capex is absolutely valuable.


I used to setup a rack for a two person engineering team. It does take focus away sometimes. Particularly if the Colo lost A/C… but thankfully that’s not really an issue these days.


> this is like saying that renting is a waste of money and everyone should purchase a home

Renting is a waste of money and everyone (or at least a lot more people than at present) should own one's own home, whether on their own or via e.g. a housing cooperative.


While renting does typically cost more than owing, from a purely financial perspective, in practice you can argue that the difference is simply what you are paying the landlord for the services they are providing. The landlord is responsible for maintenance, taxes, insurance, and more. Not everybody wants to deal with those things. The landlord might be more efficient at those things too, especially in a rental apartment.

AWS might cost more but it provides a lot of benefits that are hard to assign a dollar value to.


> The landlord is responsible for maintenance, taxes, insurance, and more.

Those things are typically factored into estimates for ownership v. rental costs, and even with those factored in, renting costs more than owning. That really shouldn't be terribly surprising, either; if landlords only broke even on their rentals, then there would be no reason to be in the rental business instead of the property management business.


As some of the sibling comments have said, owning a home can be great if it's over a certain period of time and you know what you want, kind of like capacity planning. If you don't know what you need or you're not sure you're going to use it long term, renting can make sense (like the situation where AWS is a valuable use case).

If I know I'm going to be living in a city for two years, buying a home would likely be a bad idea. Sure we could talk about scenarios of renting it out and yada yada, but we're getting into corner cases in a comment section of a thread about AWS pricing :)


Right, but how many people know they're only going to be in a city for two years? Most people intend to be in a place long-term; moving is in and of itself relatively expensive (even more so the further you move) and tends to be avoided unless there's good reason to do so or unless there's no other option.

And bringing things back to AWS pricing: yeah, at first you might not know exactly what you need capacity-wise, but at some point (assuming you're actually measuring resource usage) you start to get a feel for what you need day-to-day and where things are trending. Much like with living arrangements, infrastructure hosting tends to have at least some pieces that you know you're gonna need long-term.

The more compelling reason to go with IaaS/PaaS is the convenience of not needing to run your own servers in your own rack - and it's that convenience that you're paying for, at a premium compared to hiring sysadmins. For a lot of businesses that's cost-effective, because they don't have/need enough infra to keep a sysadmin busy, but at some point that flips.


Everyone doesn't have the same goals as you and putting things in italics doesn't make you right


It ain't the italics that make me right, but the things I write :)


There was a good rent vs buy calculator I found years ago, not sure which one it was, but it doesn't even make financial sense in some scenarios, particularly for shorter terms.


> AWS is NOT cheap.

I've realised that cloud costs are a "wave". (App is app/service/business)

- cheap to run as a small app

- costly to run as a bigger app, cheaper to self-host

- cheaper than running all infra for a much bigger app

- extremely costly compared to running all infra for a much-much bigger app

- probably on par with running your own infra (but then you're getting into tens/hundreds of millions of customers)


I mean AWS optimization is a valid consultancy/self employment career; it's like tech debt in coding, you make it work first, prove that it works, get value from it, and only then look into optimizing.

Would these clients have been able to make the upfront investment to set up their own racks and hire / train the relevant staff? That's the sales pitch for AWS and other cloud providers, you can get started a lot faster, make money, only then look into optimizing.

Running your own servers is easy to underestimate. Running it with the reliability, backup and fallbacks that cloud providers offer is also easily overlooked. How much does it cost to set up a multi region cluster yourself?

That said, there are some real life examples of companies that moved away from AWS in favor of their own servers, like Dropbox which bootstrapped on AWS, then built their own eight years later (https://techcrunch.com/2017/09/15/why-dropbox-decided-to-dro...).


> you can get started a lot faster, make money, only then look into optimizing.

IME, companies often will first fall into the trap of minimum spend with AWS for larger discounts, such that even when you _do_ try to optimize things, it doesn’t matter. There is no greater disincentive than saving 5 digits per month, only to be told “thanks, but it doesn’t matter.”

Tangentially, I think the rise of cloud and SaaS to blame more than anything for the I-shaped engineer. Why bother learning anything about optimization when K8s will silently add another node to the cluster as needed? Why bother learning SQL when ORMs abound, and your Serverless DB will happily upsize to deal with your terrible schema? After all, you’re being told to ship, ship, ship, so there’s not really any time to learn those things anyway.

I simultaneously love and hate cloud providers, and am eagerly awaiting the slow but steady turn towards on-prem/colo. 37Signals has the right idea.


> 23 billon that AWS users pissed away.

How do you know it would have cost them less to hire their own teams for self-hosting?


Business transactions aren't zero sum.


How much would be pissed away on duplicating DBMS administration work?

Are we that far removed from reality to forget that one small team of DB admins can handle literally thousands of instances?

We use AWS because collectively paying AWS 23billion is stupendously cheaper than 1m users AWS having a a whole infrastructure team.


> Everything has cost and value and managing that is going to become an engineering task in the near future.

FYI: I have known people 8 years ago who already made money by optimizing people's AWS bills. "AWS expert", "AWS consultant" etc. are valid contractor titles.


You can sit here calling “profit” pissing money away until you’re on the receiving end of it. Then you’ll be going on about the value you’re providing. Stop being dramatic.


For a small greenfield project rolling your own Postgres is a very viable option with attractive trade offs for some. If it is now considered rarefied knowledge in your circles you should be glad as this might a competitive advantage.

Let the other startups spend x10-100 on their resume driven development and blaze through their runway. Common story actually.

But suppose you don't need it and RDS as-is covers all your needs, which it often does, why rage? This is a continuum.

If an early decision to punt and outsource it to the cloud was made it could always be reversed at an opportune moment. I don't think competent people see it as "the only way" just the most common and pragmatic option.


For a small project (assuming the goal is profitability), rolling your own Postgres seems mighty distracting.


It's especially tragic knowing that Postgres did originally have system-time-like versioning built-in. Instead we get to enjoy being upsold on proprietary ETL-to-Redshift, AlloyDB, etc.


Citation needed? There have been one-off, incomplete iterations in extensions, but core and contrib have never had this.


I have never researched the timeline properly to understand at which point the concept/code got ditched, but it was part of Stonebraker's 1985 vision: https://dsf.berkeley.edu/papers/ERL-M85-95.pdf

> POSTQUEL allows users to save and query historical data and versions. By default, data in a relation is never deleted or updated. Conventional retrievals always access the current tuples in the relation. Historical data can be accessed by indicating the desired time when defining a tuple variable.

I looked this up during another thread which also has some other sources if you want to dig further: https://news.ycombinator.com/item?id=37956687


So pre-PostgreSQL then.


I would guess so, yes.



Thanks! I had used 6.0. I was totally unaware this was an option at the time. For what it was worth, Postgres was dog slow back then.


I don't feel sad at all, I feel my competition is slacking and that is my benefit.


I am sorry to say this but most developers either don’t have the ability to think for themselves or can’t be bothered.

You might disagree with this statement of course but that is truly my experience (30+ years of working professionally in software development).

Try this: Next time a developer confidently suggests that you should use tech X, ask them that the pros/cons are compared with using tech Y/Z. Ask about performance/failover/latency/team skills required etc. Most developers in my experience completely fail to give an intelligent answer. They are just loudspeakers confidently copying the opinions of somebody else. Usually somebody who benefits financially from developers using tech X.


This is actually what we're building @ Bemi https://bemi.io/

We're hoping to make it so that this becomes automatic with Postgres vs moving to specialized databases.

We recently open sourced the core tech that implements system versioned tables https://github.com/BemiHQ/bemi, check it out if interested :)


RDS is 50 cents or less a day for postgresql on a three year term for a small instance.

I use AWS / RDS - I’ve got to believe RDS is spinning money for AWS because of all the parts I want to be reliable this is way up there.


Exactly ! What's worse that we seem to have developed this all-in-or-none-at-all approach towards cloud services. Whatever happened to a hybrid approah of using the cloud only for things that you wanted ?


A database is near the top of most people’s list for things that they don’t want to manage themselves. This rant is verging on irrelevant.


I used to have some MySQL servers running for so many years that I was afraid to restart the machines.


> manage themselves.

Be careful with the language there....

I assume you meant that no-one wants to have to run and administer an RDBMS like Postgres, MSSQL, Oracle, etc - especially not things like backups, restoration, disaster-recovery, figure out storage, ensure high-availability, and maybe even things like user administration and index management: i.e. a DBA's grunt-work - and yes, I agree with you entirely.

...but using the other interpretation of "manage", people might think you meant that people don't want to know how to properly design a normalized schema and apply the business-domain, knowing how to clean-up data appropriately (i.e. the other half of a DBA's job), because we can either shove it all into a cloud vendor's proprietary (and horrendously expensive) non-relational database or data-lake solution (CosmosDB, etc) - or that we can outsource database modelling and implementation of DDL to an LLM - which I'll admit, is quite capable as-is for a first-draft for DB scaffolding and getting better all the time, but ultimately isn't able to take a proactive or autonomous approach to managing an org's databases; I don't want to seem like I'm advocating economic-protectionism of DBAs, but I get the impression LLM-based DBA-bots are being aggressively pitched to smaller-sized orgs that don't know any better, at a time when the technology is still very new, with potentially serious repercussions w.r.t. data-security and compliance (and ChatGPT still gives me T-SQL that uses Oracle functions...).


Most people I interact with are targeting CQRS as the CRUD model is what most people use. The problem with the DB central approach is many faceted as as most people really just need a repository the inherent problems testing, vertical scaling, availability, typical default of procedural cohesion etc... is why people don't care.

The self managed DB servers aren't cheap once you have to scale, typically have availability issues due to ACID rules typically limiting servers to only two AZ's per region etc...

It is actually adopting best practices for software maintainability and scalability that are getting rid of most DBA needs....because most databases use aren't doing batch processing, report generation, backups, job scheduling, or other needs that tend to be the right use cases for relational databases.

Concepts like encapsulation, persistence ignorance, and other methods of ensuring that the persistence layer is as loosely coupled as possible have proven to be cheaper, easier to test, and avoid the lock in you mention above.

As i have had a professional relationship with AWS/GCS here are a few anti-patterns from Microsoft that demonstrate this.

https://learn.microsoft.com/en-us/azure/architecture/antipat... https://learn.microsoft.com/en-us/azure/architecture/antipat... https://learn.microsoft.com/en-us/azure/architecture/antipat...

I made a good living as a consultant in the .com era helping companies throw money at their foreign key loving branching stored procedure systems that they couldn't even get to scale past the angel round some times.

And entire companies like AT&T wireless went under in a large part due to failed upgrades due to issues related to having logic in the DB.

DBs can be a good persistence layer when a system is simple and small. But really to work well within a cloud or distributed context they need to be treated as repositories unless there is a compelling technical reason to treat them differently.

'When all you have is a hammer' approach monolithic persistence layer what the primary blocker when I worked with cloud providers to help customers migrate. And they always were paranoid of moving things out of the database. And the funny thing is that most of them were trying to move to the cloud because vertical scaling on-prem DB servers became too expensive and didn't deliver the performance vendors promised.

Had companies followed say the design patterns of micro-services, where each service context has it's own DB, they probably could have stayed on-prem at lower costs.

But for the clients I worked with, who had failed to migrate on their own. Complex fragile schemas, triggers, sp's, etc... where almost always the main issue.

And we knew this even before the commercial internet existed.


eh, in my experience system managed temporal tables are crap anyway, they have a specific set of circumstances built in and anytime you break out of those you basically have to design your own trigger based solution anyway.


Can you give some examples of the problems you've had with them?

We use SQL Server's temporal-tables - and while they do introduce some restrictions (e.g. you can't mark a column as "ignored" for temporal changes, and only system-versioning (not application-versioning) is currently supported) the feature has overall been an absolute lifesaver for allowing us to do data-recovery from user-error on a per-record basis - all from within our application or directly in T-SQL; whereas previously we'd need to get do a full mount of a PITR backup which requires a lot of manual steps and can take at least an hour in most cases - and then hope that we got the exact point-in-time correct.

Even for its use as an auto-auditing system, it's great: we often get customer complaints about us "losing their data" or that we corrupted their records, when in-fact, we have the historical rows and can see that it was one of their own employees (or family-members...) mucking around. Previously we'd have cheerfly accepted the user's complaint and given them a refund or service-credit because it was too much hassle to dig-up the weeks-old HTTP logs or have anxieties about we having any non-transaction-safe OLTP code running causing corruption that we don't know about :S


Schema changes basically causing heterogeneous representations of data. If you only add columns forever and never change types its fine. Oh, and being able to specify the time a thing happened is sometimes useful (though you don't want everyone doing that of course...)

Also double checking the docs because its been 5 years since I have reasonably touched SQL Server - no indexes on historical tables so it gets slow fast, Filetable doesn't work* /nvarchar(max) records bloat you hard core.

I find that a log record is much better as a non relational/document type record that is isomorphic to the record so I can schema on read if I need to, and if you want to JSON query that stuff it works fine too.

* but nobody uses it anymore


I just implemented versioning in a CMS and I read about every article about it I could find. My conclusion is that while all articles had their point and strengths, what a version is can vary widely between applications.

For example, is my app, we decided that versions would represent "committed" modifications, for example you can modify a chapter title 60 times and then publish and get only one version. For "undo" we use browser storage. We came to this solution because it is a shared collaborative editor. In the end it is very git inspired.

The other big "problem" is structure. There are things that should and should not be versionned.

My point is that a usefull version system is very tightly coupled with the business logic, and it is not just snapshot.


I recently spent multiple months building out something like this blog post describes, for a notion-like app (to replace a different versioning scheme that could do draft pages but not point in time viewing).

We ended up doing it all in typescript, or with some postgraphile shenanigans.

One of the things we missed when requirements gathering was the ability to view a table of records each at their "latest published" version.

I think this can probably be done by including a pair of tztsrange columns in the history table (one for "when was this the latest valid draft?" and one for "when was this the latest valid published version", and do a bunch of date shuffling whenever you publish a version.

We've not actually done this yet, because there are some edge cases around dependency graphs and computed columns, but for a simpler app I reckon it would work just fine.


Specifically, the change log is an information that is almost entirely unrelated to the business logic versioning. They don't have to agree even on the timestamps, the only property that relation has is that the number of changes on the change log is larger or equal to the ones on the business versioning.

(Besides, the business versioning doesn't even have to be a single one, or have a single active version at each time.)

Those extensions are often sufficient for the change log (if you add extra information on your tables). But then, they are way overkill for that.


How do you handle the 60 modifications? Do you store each in the audit log? That's what we do and use a time rollup to get the last change within a certain time range. But I feel like there's potentially better solutions.


We use an undo stack on the browser side. We realized that it was better not to share this change log between users. Also we do not persist it. But your mileage may vary depending on your needs. In an older app the undo stack was zipped and serialized in db. With zip you avoid storing too much with every character added to 5 pages of text and you do not have to work with diff's.


The application I develop for work has its own, horrible, home-brew solution to update log/audit log, you cannot really call it system-versioned, so I read each of these posts with interest.

One thing that nearly all of them gloss over that logging the "who did it" isn't as simple as they make it out to be.

In most cases, there's a web application or an API gateway or so that connects to the database and does most of the modifications, and only logging the database user of that application makes this piece of information really useless.

What we (and I imagine many others, though clearly not OP) would need is a "do-business-as" user that maps not to the database users, but to in-application users, and then a mechanism to pass this user to the DB, and log it in addition to the database user.

The web frameworks and ORMs I've worked with in the past don't seem to have any built-in support for something like this, which somehow annoys me much more than it probably should. Sorry for the rant :-)


    > What we (and I imagine many others, though clearly not OP) would need is a "do-  
    > business-as" user that maps not to the database users, but to in-application   
    > users, and then a mechanism to pass this user to the DB, and log it in addition  
    > to the database user.
This is kind of what Supabase does, for use in tandem with row-level security. It stores the user's JWT in a custom configuration parameter via set_config() when the database connection is established on behalf of the user, so you can see the token by calling get_config() with the config parameter name. Supabase also provides convenience functions to destructure the token and get the things you need (like auth.uid()), so it would be easy to fetch the user ID or user name in a trigger to log it.


I actually "solved" this problem, though not for logging, and other restrictions made me move away from it. In our case it was for Row Level Security, but essentially we get our userId from a JWT in our API (you should already have something like that), then before each DB command, run a SET LOCAL auth.claums.userId = 'user_id'; Most ORMs let you set a variable like this automatically when opening a connection, so you don't have to think about it too much.

Then you can get as fancy as you want with RLS policies automatically attached to each table and validate the user based on a user table with capabilities. I have a feeling you could instead have the RLS rules automatically insert data instead, but I never tried it. It would have to be multiple updates though, perhaps a trigger/function.

The problem is SET applies to a session, and pgbouncer doesn't make guarantees about the session for a particular statement. Your statement might receive an existing session with another user's ID, or it might receive no ID at all.

Through AWS's RDS Proxy (beefed up pgbouncer AFAIK), the situation is slightly better, in that AWS will detect the modification to the session and 'pin' it to the current connection. Your data will be consistent, but pinned sessions can no longer be multiplexed, so you've basically removed the whole point of connection pooling to begin with.

Ultimately, its probably safer to just manually add the user into your queries. Its not that much overhead.


Agree. Sounds like INSERT/UPDATE/DELETE should be extended with a LOG SET webuser='abc', app=42, ... clause that provides an "UPDATE history_table SET webuser..." construct.

I.e. the history tables that temporal_tables provides, but with added columns that can include more information, and if the columns are NOT NULL then UPDATE/DELETE/INSERT should fail.


I know this probably doesn't help you in your environment, but there's not really any reason that the application needs to have only a single role that it uses to connect to the database. I implemented logging all the way down to Postgres roles - in that each user of the application had their own database user. We did a SET ROLE at the beginning of each transaction, to switch the transaction to the application user's role. I don't remember the details, but there is a way to do this very safely. I'm about to do it again on a new project I'm working on.

It's actually not that difficult to implement if you have a homogenous or flexible application environment; but it's certainly not common, although I don't know why (other than inertia).


You could be right but you have to keep in mind that this just adds several hours -- days if we're talking database non-experts which indeed most of us programmers are not -- and it is just one item in an other huge bucket-list of TODO items.

Having a single DB user reduces time expense while also adding much-needed simplicity. Application-level logging of "who did action X and when" are pretty easy to implement in every major web framework anyway.


Having implemented it, I don’t really agree. It adds a layer of security and auditability at the database level; logging works even if someone logs into the database from a cli and modifies it directly; and you can implement roles in such a way that prevent changes to the audit logs. None of this is possible at the application level.

To me this is one of those things that looks hard until you look deeply. Yes, it does take more time to do things right. But not that much more.

Honestly, if your application is bound to a database but you’re not a database expert, it’s a gap you should probably consider filling. So much effort is put into making the application layer do things that the database layer can already do, but the database is generally much more efficient and correct.


> To me this is one of those things that looks hard until you look deeply. Yes, it does take more time to do things right. But not that much more.

Nope, it's much, MUCH more. Application frameworks are not made to switch DB users, they expect a singular DB config with which they work.

Can it be changed? Sure. Will any stakeholder pay for it? Nope.


You’re really overstating your case. There is nothing intrinsic to frameworks that makes this difficult.

It’s basically a few lines of SQL code emitted as part of database transaction setup. The frameworks I’ve used have had the ability to register a transaction initialiser during pool setup. Other than that the framework doesn’t even have to know about it. Most of the “complexity” is on the database side, but it’s really not that hard.

“Will any stakeholder pay for it?”. If your stakeholders are serious about application security, then I would expect so.

(I’ll just add that I know of at least one product that does this - Postgraphile sets the database role up front for requests from the web. I seem to recall that Supabase does something similar.)


> “Will any stakeholder pay for it?”. If your stakeholders are serious about application security, then I would expect so.

I want to live in your world where they apparently care. In mine they never did. 20+ years career.

> You’re really overstating your case. There is nothing intrinsic to frameworks that makes this difficult.

You could be right that I am overstating it, though you also shouldn't underestimate the "blessed" ways that web frameworks do stuff. In practically all that I worked on they just wanted you to chuck either a database URL or break it apart on host/port/user/pass/db-name and didn't make any effort to dynamically create / drop users as the application is creating and deleting them.

> The frameworks I’ve used have had the ability to register a transaction initialiser during pool setup.

As said above, I know it's doable. What I am saying is that nobody ever gave me the time to do it and with time I stopped caring. Turned out that putting the currently logged in users in global logger and telemetry contexts is good enough. ¯\_(ツ)_/¯


I see this “system user” all the time in large integrations projects. It is indeed simpler and many times works just fine but it sure is nice to have something like a “run-as” capability so the far side system can apply user specific security policies based on who is accessing the system vs only the system user profile/role.


The run-as thingy is done at application level. I didn't find it hard to plug into my maintained apps' logic to insert audit log entries like that (you have access to the currently signed-in user anyway).


I don’t do this kind of stuff anymore but if I did I would do something similar. Maybe add a before trigger that caused an error if no role was set. That way all connecting applications must set a user to use the db. Then put the database in charge of audit logs (who did what and when) and data access control with standard permissions and row level permissions. I’m sure you can balance that against an identity provider too.


Yes, exactly this. You can even do the user creation with default privileges as part of the transaction setup, if absolutely necessary (ie, create role if not exists…, set role …). Typically the audit is a table trigger so you can fail at that point, or even add a created_by column that looks up the role and raises an error if it’s not set (technically the role would always be set to the application’s db access role, so the check would be that the current role is a valid end user role).

I mean there are more and less efficient ways to do it, but why have a separate context setting for “user” when the (Postgres, at least) database has such great support for roles. I do find it bewildering that our tools ignore this by default.


I handle that with a “event_source” enum column [“database”, “web app”, “services_app”, etc] paired with user_id (thought fk won’t work for a lot of other folks cases). But I like the idea of doing that at database user level as another commenter posted.


Yes, usually «who did it» is at least as important as the history itself


Might it be easier to just add a `last_updated_by` column? Then passing the data to the db is as simple as adding it to the insert and it would be automatically stored in the history table as well without needing any special handling.


This extra table approach is how the django_simple_history[1] implements model history. Although it hooks into the Django ORM to save history rather than triggers. It's simple and brilliant, and I look it into any model that needs the ability to revert.

[1] https://django-simple-history.readthedocs.io/en/latest/


The problem with Django simple history and similar is they use Django signals and aren’t transactional. So if you have any bulk updates or things that don’t trigger signals, your SOL, and your history starts missing things

A more robust solution would use triggers or an ORM with more transactional guarantees (aka not using signals)


There's also django-reversion, but I can't say I loved using it.


this approach is also available as an extension here:

https://github.com/supabase/supa_audit

it uses a slightly more generic implementation so that you can easily turn on/off auditing. i.e.

    select audit.enable_tracking('public.account'::regclass);
    select audit.disable_tracking('public.account'::regclass);
the reason for the 'public.account'::regclass is to store the tables OID, rather than the name - this is cheaper than text and allows you to rename the table without losing the history


Thanks, not sure how I missed this - just added to my “Postgres Is Enough” list:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

I do a similar thing but add a foreign key columns for faster lookups (traversing using is slow on large sets when querying). The fk’s get set by the function itself during insert. This allows me to create per resource audit logs in the UI.

I also wrote some deep diff functions to get the changes between the old and new jsonb if anyone is interest. This makes it easy to display what exactly changed.


> deep diff functions to get the changes between the old and new jsonb if anyone is interest

I'd love to see this if it's public


Sure. It's somewhat out of date but gets the general idea across:

https://gist.github.com/cpursley/a2ae79423bee9274e3639839967...


I implemented this using drizzle and PG not long ago

Check the repo here https://github.com/grmkris/drizzle-pg-notify-audit-table


Limited to JSONB types though, by the looks of it.


Yeah not really the same then. The approach in the post allows you to easily reuse existing queries by just adding global filters and pointing to the history table.


the data is stored as JSONB so that it can be used generically across all tables and adapt to schema changes


It's definitely a reasonable trade-off given the circumstances. Do you know whether Supabase teams who use this extension simply avoid using any non-JSON Postgres types? Or do they lean on workarounds (e.g. JSON-LD encoding)?


> avoid using any non-JSON Postgres types

i'm not sure I follow - why would they avoid using non-JSON types?

more generally however: I don't have a lot of insight into the usage/feedback for this extension, but I can ask the team


The issue is the complexity of figuring out appropriate JSON-compatible serializations and getting the implementation correct for every single column in use. A simple example would be round-tripping the Postgres money type using salary::numeric and later (user_data->>'salary')::money

A much more complex example would be representing a numrange where you have to preserve two bounds and their respective inclusivity/exclusivity, in JSON.


I imagine you could modify to use array type. But jsonb makes it easy to normalize back and forth to a row.


I have implemented this for our tool NF Compose that allows us to build REST APIs without writing a single line of code [0]. I didn't go the route of triggers because we generate database tables automatically and we used to have a crazy versioning scheme that was inspired by data vault and anchor modelling where we stored every change on every attribute as a new record. This allowed for simple point in time queries.

Sounded cool, but in practice it was really slow. The techniques that are usually employed by Data Vault to fix this issue seemed too complex. Over time we moved to an implementation that handles the historization dynamically at runtime by generating historizing SQL queries ourselves [1]. We now use transaction time to determine winners and use an autoincrementing column to determine who wins on ties. A lot of brainpower went into ensuring this design is concurrency safe. On a sidenote: Generating SQL in python sounds dangerous, but we spent a lot of time on making it secure. We even have a linter that checks that everything is escaped properly whenever we are in dev mode [2].

[0] https://github.com/neuroforgede/nfcompose/

[1] https://github.com/neuroforgede/nfcompose/blob/main/skipper/...

[2] https://github.com/neuroforgede/nfcompose/blob/main/skipper/...


For Python at least, sqlalchemy-history might be a good option. I used its predecessor, sqlalchemy-continuum, and that was decent as well. It just uses the same definitions as the ORM, so it's very usable. Doesn't work with Django because Django ties you to their ORM, but for the rest of the Python world it might be worth a look.


django-pghistory is a good option with Django.

https://django-pghistory.readthedocs.io/en/3.0.0/


I am not sure why ranges are used. It seems to be redundant, because the beginning of the validity of a value is identical to the ending if the validity of its previous value. A column with an invalidation time stamp seems to be sufficient for me. The rows, in which this column is null, are currently valid.


There might not be a previous value.

If you insert a record and then later delete it, you need to store a range or two timestamps to know when the insertion and the deletion happened.


The way I deal with this is my history tables have 3 fixed columns:

    * OperationType: An ENUM of "INSERT", "UPDATE", "DELETE"
    * OperationTimestamp: TIMESTAMPTZ
    * OperationId: BIGSERIAL (needed to can handle cases where a row is modified multiple times in a transaction)
This structure is super intuitive to understand, and very efficient since the History table is append-only. But if you are trying to use it to reconstruct the state of items from a specific time the past, it's a bit awkward and the use of a RANGE column instead would be easier to query.


Ah, I did not think of that!


Foreign keys / references; you query one history table at timestamp X, then the joined history table with a "where start > X and end < X" to get the foreign key's data at timestamp X.


With a single timestamp how would you write a queries such as "what was the state of the record at this specific moment in time" or "what changes were made to the record between start_time and end_time"? TFA is also using an index on the range table to ensure that you can't enter overlapping entries in the history.


This method uses timestamps as version IDs. That's not very reliable when tying rows together, such as primary/foreign-key tables or multiple rows in a single transaction. (There's also a "history_id" column for each history record, but it's not used to tie rows.)

Suppose you've got an "order" table, with a "lineitem" detail table, and a "part" (stock item) master table. What happens if you have, say, two rapid-fire transactions updating the order, with another transaction updating one of the parts sandwiched between the two order updates? Even with the three transactions being processed sequentially, it's possible for all three to wind up being handled in the same clock tick and have the same timestamp. As a result, it's not possible to accurately match the history records for the three tables. An opposite case is when a transaction straddles a clock tick and various updated rows end up with different timestamps.

The author even alludes to the problem. He tackles the transaction-with-multiple-timestamps issue by using NOW() instead of CLOCK_TIMESTAMP() and punts primary/foreign-key issue, in his use-cases, with "... I don’t think it makes sense to have multiple states at a single point in time anyway." Well, experience says it will happen, surprisingly often. If your use-case can't handle the ambiguity then you'll need to do something different.

A more rigorous approach is to include a "version_id" column in each table. If you make it one of the new-fangled UUID7 types then it will be sortable by timestamp. Every table updated in a single transaction should have the same version_id, and every foreign-key reference should also include the foreign version_id. So the "lineitem" table would have "id", "version_id", "order_id", "order_version_id", "part_id", "part_version_id". It is, indeed, a bit more complicated.

EDIT: Normally only the history tables need the foreign-key-version columns; you can code the history-creation triggers to automatically include them. This eliminates most of the additional complexity of using them.

EDIT: More issues with using timestamps: Consider this timeline:

12:34:56.000 Transaction A begins. 12:34:56.001 Transaction B begins, updates a foreign-key table that's needed by trans A, and commits with a timestamp of 12:34:56.001. 12:34:56.002 Transaction A gets around to reading those foreign-key tables, gets the version created at 56.001. Trans A then commits with a timestamp of 12:34:56.000.

So if we look at trans A's history, timestamped at 56.000, we'll wind up matching against the old, stale foreign-key history rows.


> What happens if you have, say, two rapid-fire transactions updating the order, with another transaction updating one of the parts sandwiched between the two order updates?

You have a race condition here, regardless of whether you use history tables or not, no? It'll probably still be wrong for the user if you update the part first, then do the two transactions, as you now refer to the wrong (a more recent) part version.

I mean, that's the entire reason why Stripe have `products` with multiple `prices` that are "immutable": The price, tax details and so on can't be changed, but they can be archived or set as default. A line item will refer to the price itself, not the product.

If you somehow need to do multiple transactions and your parts aren't immutable, I think you must refer to some immutable history table -- regardless of whether you use the triggers, make an append only table or do something else. If you use the triggers mentioned in the post, you could save the time you looked at the part and do an insert like so:

  INSERT INTO line_items (order, part_id, part_history_id)
  SELECT ${order_id}, ${part_id}, ph.history_id
  FROM parts_history ph
  WHERE ph.part_id = ${part_id}
    AND ph.systime @> ${as_of_time};
But to be fair, it seems unnecessarily risky. I'd just query the history as of now and use the history ID in the subsequent transaction, or make an append-only part version table and refer to that.


No race condition. As long as the transaction is properly constructed (let the database retrieve the unit prices and calculate the extended cost) it will abort and roll back if there's a conflict, including if a source row (in this case, product) changes. So your transaction could look like

    BEGIN;
    WITH
      x_lineitems AS
         SELECT li.id, pt.id pt_id, li.qty, (li.qty * pt.cost) ext_cost
            FROM (VALUES (${lineitem1}, ${qty1}),
                         (${lineitem2}, ${qty2})
            AS li (id, qty)
      x_order AS
         INSERT INTO orders (id, total_cost)
           SELECT ${order_id},
                  uuid_generate_v7(),
                  (SELECT sum(ext_cost) FROM x_lineitems)
           RETURNING id, version_id
    -- Line items get same version ID as the order.
    INSERT INTO line_items (id, version_id, order_id, part_id, qty)
      (SELECT xli.id, xord.version_id, xord.id, xli.part_id, xli.qty}
          FROM x_lineitems xli, x_order xord);

    -- Triggers on orders and line_items fire, still as part of
    -- the transaction.

    INSERT INTO order_history (id, version_id, total_cost)
       SELECT o.id, o.version_id, o.total_cost
         FROM orders o WHERE o.id = ${order_id};

    INSERT INTO line_items_history
       (id, version_id,
        order_id, order_version_id,
        part_id, part_version_id,
        qty)
       SELECT l.id, l.version_id,
              o.id, o.version_id,
              p.id, p.version_id,
              l.qty
         FROM line_items l JOIN orders o JOIN parts p
         WHERE l.order_id = ${order_id}
           AND o.id = ${order_id}
           AND p.id = l.part_id;

    -- And commit everything. If any of the parts rows we referenced
    -- change mid-transaction then everything gets rolled back.
 
   COMMIT;


I think you have to really think hard about what you want out of a history tracking system, and there isn't a one-size-fits-all answer. We did something like the article but much more elaborate. At the core though, we addressed the concern you raise in two ways:

1. We forced fully serialized write transactions, because we determined that the users needed simplified semantics they could understand. They need a linear history model for the whole DB, not leaking out a weird graph of MVCC states on different rows. This obviously has some significant performance/scalability implications. Our worldview is more for small, long-tail use cases, so we can scale by having multiple independent databases for different communities and not worry so much about having one large database with optimized write concurrency.

2. We tracked the version timestamps generated by all write transactions in another system table with a uniqueness constraint. The theoretical collision here would abort the second write transaction with a conflict, requiring the application to try again (with a later timestamp). I am not sure it has happened in practical operation given the high precision timestamps used in a modern Postgres deployment.

But, we went wild in other ways: we store model information (much like is in pg_catalog) and have history tracking of that too. This allows different snapshots to add/remove schemas, tables, columns, key constraints, and foreign key constraints, as well as renaming schemas, tables, and columns. The history storage uses JSONB and the model snapshot is needed to properly project this content back into the specific table definition at a given time.

Our systems application is a relatively low-level web service that provides access to a shared database. So we're not really targeting hand-written SQL use cases (though it is feasible to do some custom SQL against the "live" database as a privileged system operator). Instead, our regular use case is that all queries are expressed in our web service API and it generates the SQL queries. Then, an extra parameter can specify a snapshot ID to have the web service API operate read-only over an earlier snapshot. It generates the different SQL necessary to perform the same kind of query over historical data.

And, since our user-visible concept is a web service rather than a raw database, we inject our own web authentication and authorization scheme. So authenticated web users can be tracked for the "who did it" metadata, and we also built our own fine-grained access control scheme that is understood by the web access layer. This policy model is written in terms of the database model and so also gets tracked as snapshots of policy appropriate to one version of the database model. Querying an older snapshot means reconstructing the older model and the older policy that is written in terms of that model.

A final kink is the possibility of needing to amend history. You might need to go back and REVISE the policies applied to older timestamps, because you realize you had a flaw in your policy or because organizational policies have changed and you want to enforce these on any subsequent queries of older data. You might also need to REDACT older data entirely for some kind of confidentiality or compliance reason. This could mean scrubbing historical tuples to make certain fields "null" or even making some historical tuples disappear entirely.


Good write up. As I've written my own temporal table library for Postgres that also runs on managed databases like AWS Aurora, here are my observations:

If you main table inherits from a history table, you can see a complete history of every entry. It also never requires triggers on insert, only update and delete. A major drawback however is that Postgres table inheritance uses the same mechanism as partitioning, so you can use one or the other but not both.

Instead of inheritance, you could keep them completely separate, which is what I ended up with and created set-returning functions of the same name as the main table with a timestamp range as a parameter. The function performs a UNION ALL to merge active and historical results. So instead of

    FROM mytable
you use

    FROM mytable('2023-03-01 00:00:00Z', '2023-04-01 00:00:00Z')
This has made temporal queries much easier to understand and maintain.

Another option put forth was to save the timestamp range and tableoid in the history but have historical column data stored as jsonb. This allows history to be shared with all tables and is more forgiving of table column changes. It's also obviously slower and less storage efficient. The temporal access functions end up calling jsonb_populate_record(...), which is extremely handy but again not as efficient as a twinned history structure. However unless you are making a lot of temporal queries with strict SLAs, the jsonb record solution is often good enough, especially with judicious use of partial indexes. Watch out for dependencies on tableoid though; the same table name created by anything other than a dump/restore will not have the same tableoid, so a surrogate id and mapping table may be preferred.

One major difference between my implementation and the author's is that my history table is never updated. Ever. It is always append-only. This is a side effect of the initial insert not being recorded in the history. It also means the active table requires at least a last_modified column that is updated by trigger.

As was mentioned, adding and removing columns from the active table is a difficult and complex proposition. In fairness though, neither MariaDB or MS SQL Server handle this situation either. Both just tell you to detach the history, make your table changes, and start a new history. This is where the jsonb history option when rolling your own in Postgres really shines.

Not covered in the spec or in most temporal implementations in Postgres, it can be VERY useful to record the user you made the changes. Rather than record the role—though this may be all most systems have—I've tended to create application users that are passed in from JWT claims as local config variables. I got the idea from Postgraphile (https://www.graphile.org/postgraphile/jwt-guide/). At first I added this to every history table along with the start and end timestamps but later realized the user never changes mid-transaction. I switched to a model where the transaction id and user are stored in their own table. Rows in Postgres already store the transaction id. Unfortunately this suffers from the same dump/restore problem that tableoid suffers from, so you may want a transaction id surrogate instead depending on your needs. You will also need to track the delete event separately from the last state of your active table record. The user who deleted it may very well be different from the last person to insert/update it. This was handled with a start and end timestamp that were the same, aka a point in time.

Finally, it's worth exploring event triggers. Managing all of this manually is a huge PITA. Better to watch for CREATE TABLE events and make the history setup automatically. Better for consistency and for dev sanity. I'm usually not a big fan of dynamic SQL, but this was an area I never regretted utilizing that particular Postgres sledgehammer.

Temporal support is a massive undertaking if you want to go beyond the basics. It also imparted even more respect for the Postgres developers for such a powerful and expressive platform to work on.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: