Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My question when people implement a generic feature like this, is why doesn’t the database do it? And many tunes, as is the case with soft delete, it does. For example, Redshift tombstones records and you can choose when to “vacuum” them up (actually delete them).

Usually if you’re changing the way a primary function of the database works, like delete, it’s probably not a good move.



To take a stab at it:

This is one of those things where business requirements trump the technical implementation details. Prevailing theory is that actual deletes are bad because you can’t do historical analysis, recovery etc on the data. Say a customer stops using a service for a year but comes back: it’s a big win if you can (at least optionally) restore their data, so the theory goes.

That’s why tricks like this exist.

The alternative is to write the data to a separate database and/or table that’s meant for archive purposes. I think it’s better than these soft delete tricks but it’s got complexity too. More resilient IMO and let’s production systems run leaner over time IMO


I think there's still a spot for deleted_at or deleted_at'like functionality.

It's around historical data, especially in a work scenario.

For example a worker might create a thread and then 38 other workers reply to it. There could be a lot of great information in this thread. It could also be referenced in 5 other threads and external sources (docs, etc.).

If the worker leaves the company, should you really delete them in such a way where all of the threads they've created get deleted in a cascading fashion? I'm all for privacy and I would want to see that happen in most public communities but for private work, I don't know. That changes everything.

I've seen a number of tools keep the user around and visibly label them as "Jane Smith (Deactivated)". I think that's a lot cleaner than having a special reserved "Deactivated" user and then you change all of the user_id FKs to that user before deleting the person who left. If you do it that way you lose the context of who posted the original thing which has a lot of value in a long running project.

But all of the above hints at using deleted_at most likely. It keeps everything in tact at the DB level and then the app layer chooses what to do for each resource type. What other options do you really have if you want to keep things working exactly the same after deleting someone except for maybe showing an indicator that they're not around anymore?


If you’re separating your deletions from everything else then I imagine archiving a user would not be the same as deleting one, in which case why not just mark the user as archived?

It prevents the overload of what deletion means as to preserve proper separation of concepts too


Wouldn't this be better handled by marking the user entry as inactive and then reading that value from the join rather than setting every conversation thread as is_deleted/is_deactivated?

I feel like you are solving a different problem than the one presented in the article.


It was more about the comment I was replying to around a use case where historical data is important to keep around.

Marking it as inactive with a boolean makes you lose the context on when the inactive toggle happened.

Chances are you want to use deleted_at on the user in my use case because that provides you the timestamp of when it happened and also lets you have application logic which prevents that user from being able to login and do things, but still lets you show their previous information to everyone else as if they didn't leave and all you have to do is have a tiny bit of template logic that appends " (Deactivated)" to their name if `deleted_at` is not NULL.


Another is the support case “Help! I accidentally deleted the wrong thing!” where it saves a huge amount of time compared to loading up a full DB backup.


This is precisely the situation the article solves but without repeating

  AND NOT is_deleted
after every DB query in every app accessing the database. No full DB backup/restore needed.

  INSERT INTO mytable
       SELECT recovered.\*
         FROM myaudittable audit
            , jsonb_populate_record(
                audit.jsoncolumn
              , NULL::mytable
              ) recovered
        WHERE audit.id = 8675309
  -- optionally merge if new data added
  ON CONFLICT DO UPDATE
          SET field1 = EXCLUDED.field1
            , field2 = …etc…
Postgres has a lot of great functionality making jsonb manipulation relatively simple and easy. Is it more complicated than a simple UPDATE? Yes, but you only have that complexity once rather than in every query on the table from every app and ORM and that recovery can be more nuanced since not every restore strategy is equally valid in every situation.


> Usually if you’re changing the way a primary function of the database works, like delete, it’s probably not a good move.

This isn't a soft delete -- you can't use the data for undeletion and it's not visible to an application that can see that table (as it would be in the case of a `deleted_at` column). It's only for analytical purposes, as the author says.


re: why doesn't the database do it

They do - almost all databases will offer change data capture integration either via making the write ahead log readable or by suggesting triggers, like the article.

If you wanted to read the deletion journal in another system (maybe because of transaction rate), you might prefer CDC to Triggers. For example: https://www.postgresql.org/docs/current/logical-replication-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: