Hacker News new | past | comments | ask | show | jobs | submit login

> They say it can take up to 30 days to process.

What the heck...

My current day-job is a multi-tenant SaaS system - every single entity in any of our heterogenous databases is tagged with a tenant identifier (it's also used in composite PKs/FKs in RDBMS databases to prevent incorrect inter-tenant references). Every database system we use (MSSQL, MySQL, Cassandra, etc) has some way to query itself (INFORMATION_SCHEMA, system_schema, etc) which allows for query-generation to generate queries that will reliably dump all data we have associated with any tenant (and by extension, any user) within seconds, even if there's been some ad-hoc or unplanned database design changes (this is the only time I use "SELECT *" in production code!) - this is from a weekend project where I wrote a simple web-application that generates and runs those queries and dumps data to CSV files for tables and JSON for document-stores, shoves them into a zip and uploads them to Azure Blob storage. I didn't build it for GDPR compliance but actually to allow me to undo any unintentional data-deletion without needing to do a full database restore.

My operation's scale is nothing compared to Spotify - but if a tool I built in a couple of days can reliably dump all data about a subject across multiple independent cloud databases within seconds - what's Spotify's excuse?




It's obviously not a technical problem. The 30 days comes from the GDPR legislation. Spotify is just behaving like an obnoxious child that needs a good smackdown from the EU. Howevever the GDPR also stipulates that this needs to happen without "undue delay".

I'm not a lawyer specializing in EU law, but I think this wording was put there for the precise reason of being able to punish companies that behave like that.

They'll get to them eventually.


I don't think that's fair. If the law sets the limit at 30 days, there is no reason to do it sooner than 30 days. It is obviously a money-losing proposition to do it quickly, and the only reason it's an option at all is because of the law. I guess that makes it a reasonable compromise -- everyone is equally unhappy, but the world is a slightly better place.

The reality is that "30 days" translates to "someone will run a script manually once a month" whereas something like "100 milliseconds" would force you to have automation. It's a reasonable tradeoff for rare events. There is a lot less engineering effort required, and that makes it cheaper.


> there is no reason to do it sooner than 30 days

There is. Any tiny issue puts you past 30 days and invites a slap from the regulator. There's a lot of space between 100ms and 30 days. If you can't run your monthly process twice a day instead for example, you're most likely delaying on purpose - that's the "undue delay" part.

There's a company I know which decided not to pay any invoices until the last possible moment - which one day resulted in the electricity being cut to an office building with hundreds of people. Do you think that was a good decision in the end?


You are reacting on just one part of the directive, even though I just told you about you another part. Seriously? What's the point?


This seems like an extraordinarily hostile inference to draw from Spotify's statement about the maximum amount of time it'll take. Do you have some specific reason to think Spotify is engaging in undue delay?


As someone else mentioned, there's a legal maximum, and I think a company is also held to their quoted timeframe (and they may be required to quote a timeframe as well).

Additionally, having a delay can be useful to make it harder for someone to take over your account, and grab all your data; especially if there's sensitive data that might not be otherwise accessible or enumeratable through the UI or API.


The 30 day figure is likely derived from the legal limit under GDPR for a Subject Access Request (one calendar month).

Why commit to a tighter SLA than you legally have to?


Exactly. I’d be surprised if it actually takes 30 days, but there’s no point to them quoting something faster.


This is a very naive view "If I can do <X> in Y time, why can't <insert some company>?".

Any large company has a tens, hundreds, or even thousands of different teams who may own a system with some data that identifies you as a customer. Presumably, they're on the hook for serving you (the customer) all of this data when you ask for it. Honestly, I'd hate to be a legal counsel at a company having to sift through every attribute/data column trying to figure out what we "have" to return vs. what we can probably keep hidden as a trade secret, but I digress.

Anyway, there's no guarantee that even half of the systems storing this data were designed with GDPR (or whatever privacy-related) compliance in mind.

Consider a system that's storing nested JSON blobs with customer-identifying data several layers deep. You happen to be on a team that owns this mission critical system. Your legal department gets you to prioritize some dev work to build a system to quickly extract this data.

You'll (probably) do it in the most cost effective manner – it might mean rearchitecting your system if the cost of extracting that data is very high. It could be that you have the tools to extract this data very quickly and so you just need to plug and play. Or there's at least one more scenario where such an operation is so expensive (and impacting on your main business function) that you accumulate a bunch of requests (i.e. 30 day window) and run some ETL to get all the customer data and respond on the compliance requests. So with that approach it's definitely not a real-time or near real-time response.

And my example scenarios here are actually pretty simplistic for a large company. Imagine the scenarios where you have N customer records with some loose notion of an evolving schema over the years. You're not even sure how to query that data or transform it... or your automation works 98% of the time to pull the customer data but 2% of the time it fails, and so you have to time (legally) to manually have an engineer fix that edge case (depending on how expensive that work is) or the engineer manually goes and pulls your data, accounting for whatever edge case that was discovered.

The more data you store about a customer, the "harder" this would get.

I would probably use data retention policies to drop as much data as I can, although I'm sure at a large company, your business customers push back: "Oh we've never had to use that data set, but we want you to keep it because we might use it to build features for some new ML model in the future or to solve other problem <Y>".


You’re not wrong (and I’ve upvoted you too) - but given the importance of complying with the spirit of the GDPR and other consumer-first regs - especially given that Spotify is HQ in the EU they should be doing more. You do set-out a reasonable explanation for non-immediate data-dumps, but from the evidence I believe companies like Spotify and many others intentionally drag their feet and do the bare minimum, at least in-part, to discourage requests. Of course that also makes it bloody obvious too. More telling is the fact that they aren’t being transparent about it either. If they posted a simple statement explaining in-not-too-much-technical-detail why it takes so long that would be a start.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: