Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The (consumer) company I used to work for also allowed their customers to "delete" their data. Deletion was implemented as a boolean filed in the database "deleted - true/false'. We called it "soft deletion". And why was it implemented like this? It's because actually deleting data is hard. There is no single database and the data is distributed across many servers. It's also backed up in different places. Running the delete operation can be extremely costly and can also create service interruptions and data integrity issues. I think there was a script that was supposed to actually delete the entries but it was not run very often and was there for legal and compliance issues.

Just remember that when you request to delete some data on the internet, it doesn't actually get deleted (right away anyway). The best way to deal with this is not to give random sites your real information in the first place. However, that can be difficult or impossible when dealing with government, financial institutions or shopping sites.

Edit: And just to address questions below, the actual delete script was not run daily. I don't know how often it was run (I was not an SRE) but I presume it was run at least once a month. I have no idea how other companies do this.



> there was a script that was supposed to actually delete the entries ... was there for legal and compliance issues.

Sounds like the laws worked in this case. They required data to be actually deleted, and it was due to those laws, and only due to those laws.


No you don't understand, the script exists for plausible deniability, it even runs sometimes! And if you find out we didn't delete your data, we might even go out of our way to run it for you. Except if the script doesn't run anymore because it's been broken. Or because 5 microservices were added since the last time we "actually had to run it", and so even running it makes no assurance it actually deletes everything about you.

But if an internal lawyer really puts their foot down, we might put an intern looking at it for a couple of days.

I'd bet a finger this is how it works in most companies, and I know I've seen worse versions.


Many businesses would still use soft-deletion even if distributed data wasn't an issue. The company I work for has soft-deletion enabled because they want to be able to help customers who accidentally delete something. I wish we would just tell them "better luck next time", but obviously management will never say that.

What annoys me more is how many companies give next to no insight into or control over data retention. It should be unambiguous how soon or often our data gets hard-deleted, if ever.


Heh, I once worked for a company that had an "is_deleted2" field .. it indicated record was "hard" deleted and not accessible anymore via usual means!!


It's 2024 if you can't delete data without corruption or downtime you're an absolute buffoon of an engineer

If anything gdpr made painfully obvious how sloppy some devs/companies are


Let’s be clear that what you describe is absolutely not gdpr compliant, so it would be illegal if you do business in Europe


Did you read the whole comment? They say there was a batch script to comply with legal requirements.


They said they thought there was a script, but it wasn't run very often.


Didn’t seem sufficient to me at all, but I’m happy to be proven wrong.


I work for a company managing a team that has built this for GDPR compliance.

Customer submits a deletion request. We have a fan out process that takes the deletion request and submits it to a bunch of different data locations. All of these must respond within 2 days (though the required time is 72h). Each of those data locations will queue up a job to remove access (soft delete) the data, and schedule a hard delete for 28 days in the future. If the customer says they don't actually want the data to be deleted, we cancel the data hard deletion and revert the soft delete. If nothing happens the hard deletion goes through.


Thanks, that’s insightful. In this case, it seems sensible to me at least.


> but that was not run very often

GDPR has strict rules about how long data can persist after the deletion request is made.


Who knows what "not very often" means. It could mean once a day or once a year. The point is that this could be made to be compliant with little extra effort, so pointing out "um actually it's not compliant" is not saying much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: