Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I Downloaded the Information That Facebook Has on Me (nytimes.com)
123 points by tim333 on April 12, 2018 | hide | past | favorite | 103 comments


I just downloaded my Facebook data archive. It is incomplete. They are leaving out one of the creepiest aspects: Your search history.

Click in the search bar on Facebook. On the right is an Edit button. This shows the search history they keep on you -- all the people you've looked up. Says a lot about you. It goes back years, unless you clear it. (And who knows what "clear" really means.)

This would be a rather unsettling thing to see in your archive. They totally left it out.


More missing:

- Likes on everything I liked on FB and external

- Shares via Share Buttons on external websites

- Websites I visited and where I was tracked because of Social Media buttons

- Time spent on FB

- Status updates I changed or dismissed

- Mouse hovering behavior to train the feed

- Data like my phone number that wasn’t put there by me but correlated through my friends address books

- Metadata through WhatsApp acquisition

- ...


From Mark's testimony yesterday, it seems like the above attributes are not your data, but FB's data about you - to which you are not entitled. "You own your data" means any messages, photos, and posts you write you keep your copyright to, and FB will make those data easily exportable. But any metadata, or any usage data is not actually your data, Facebook currently contends. Nor is any data about you that Facebook has, acquired, or purchased.


Status updates I composed but redacted but were captured from their key logger is my data.

My likes are my data, because that is what the whole Cambridge Analytica thing is about (deriving a personality model from Likes).

My search history is my data, because it helps to identify me as a person.

My phone number they obtained through my friends uploading their contact list is my data.

Clearly, I’m not entitled to all data, but to a lot more than what’s in that archive.

Edit: And of course, if they have a shadow profile on you without you having an account (like your phone number and your name through the address book your friend uploaded), that is all your data, processed without your consent and without a way for you to get hold of.


>"Status updates I composed but redacted but were captured from their key logger is my data."

This seems like an important one. If they log all those messages, presumably they should all be there with some indication of status (deleted, published, etc.).

I can only imagine the sensitive things contained in that dataset... Things like mistakenly pasted plaintext passwords, PII, etc.


Except the archive contains ads topics, ads engagement history, advertisers who uploaded a contact list with your info, and account login activity, none of which qualify as copyrighted content that you shared. That's private usage data and data about you that Facebook acquired.

Facebook is being selective in what they export here. The result is a misleading impression about what data they have on you.


Would be a neat little project to try to scrape your own data out of Facebook.


Do you really think they're obligated to provide you absolutely everything they know about you? I don't.

I would say if they provide everything you've directly provided, they've fulfilled their "obligation."

Obligation is in quotes, because I don't actually believe they should even be required to do this.


> I would say if they provide everything you've directly provided, they've fulfilled their "obligation."

Post likes, shares and search history are all examples of data you've directly provided that they're omitting from the export.

As for your principles, when you ask the FBI for their file on you their obligation is not limited to data you've directly provided.


According to https://www.facebook.com/help/405183566203254?helpref=faq_co... likes, shares, and searches are included.

The FBI is subject to FOIA. FB is not, obviously.


No, that actually says it's not included in the Downloaded Info.

It's in the Activity Log, which is omitted from the download. This is not obvious and misleading. In fact someone asked whether they could download their activity and Facebook directed them to the Downloaded Info, which omits it.[1]

As for FOIA, you're missing the point. Of course FB is not subject to such a law. The whole debate right now is whether it should be. If you request an organization's "file" on you, it should be comprehensive and not selectively omit the creepiest stuff.

[1] https://www.facebook.com/help/community/question/?id=6345563...


>"Do you really think they're obligated to provide you absolutely everything they know about you?"

I would say no if before you signed up they were willing to tell you everything they "intend" to collect about once you agree.

And when I say "tell you" I mean an itemized list in plain simple english not a thousand pages of obfuscated legalese.


If you've directly provided Facebook with some information, you could have recorded this at the time, so their telling you they know that is something you could have figured out for yourself as it's no more than a reminder of what you've already told them. If that's all the law requires them to do then it either needs to be repealed as it's practically toothless, or amended so that they're obliged to tell you more.


If I did not directly provided them with my data, they would have stolen it. So, yes, I think they have to provide and delete upon my request.


They might have deduced it from other data they have obtained legitimately, but as it's still data about you they have, they should tell you it and if it's incorrect you should be able to ask them to delete it.


I believe data about you missing from the export would be a violation of the EU law that requires companies to provide this feature.


True but how do you prove it and how do you claim an honest full download?

Because one thing is clear: the data they have on me in my archive is by far not enough to train the feed in the way it is shown to me. How do I know? Because my download archive doesn’t have that much info on me and there must be more.


I mentioned search history because it's one example where we can prove the archive doesn't export all your information. Because it's visible in the UI if you know where to look.

Presumably the same is true for likes on posts/shares, a major feed training signal. (Find the old post, it will still show that you liked it, but post likes are not in your archive.)

But yeah, then there's the darknet of data they've gathered on you that isn't visible in any UI.


I have to say: I feel pretty helpless that I can’t force them to give me that data although the EU-laws require them to give it to me (as a German).

My search history and Likes are clearly data that can be associated with me as a person and thus, fall under former EU data protection laws and for sure under the GDPR.


Couldn't you take this evidence to the bfdi[1]?

[1] https://www.bfdi.bund.de/


GPDR (assuming that's what you're talking about) doesn't take effect until next month


That is not correct. GDP enforcement doesn't start until late May, but the law is already in effect.


Why should Facebook remember the people I’ve cut off from my life?

For one reason, so they aren’t recommended to you as a possible friend in the future. Seems like a minor thing to be upset about in a possibility space of major things.


Seems reasonable. I can see the <insert perpetually outraged millennial news site> headline now: "An open letter to Facebook: How dare you recommended I be friends with my rapist"


To be fair, that situation sounds like one that would be better served by using the Block option, rather than just inference from the defriending action.


But that's still "data" that needs to be maintained for you.


A "block" list is different from a "people you unfriended" list. I see the point of keeping track in order to not show those "on this day" posts of someone you want out of your life, but the block feature was made for that purpose and should be used for these situations.


I am overall critical of FB (see my top comment on the omission of search history) but in this case I can see this as a good faith design decision. Yes, ideally users should distinguish between:

- "block" (keep track of them so they don't appear in your feed)

- "unfriend" (erase the connection; they might reappear)

But this also adds complexity which can lead to UX failures. Probably a lot of users were using "unfriend" when they really want to block someone. So then their abuser shows up in their memories and Facebook suggests they become friends or whatever and then there's a scandal because look how insensitive FB is.

So the UX designer's choices then are, try to educate all your users about the distinction here and get victims to use a block -- which is not going to be 100% successful -- or err on the safe side and just make "unfriend" work more like a block. I can see them making the latter design decision in good faith, and maybe I would too.

Note: It is very common in more technical circles like HN to just implicitly blame the user here for not understanding the options. I.e. if less than 100% of users understand "unfriend" is not a "block" then, well, they should pay more attention. (This is very common in security discussions.) But good UX design means owning the outcome and how users interact with your interface in the real world. And so simplicity often is the better route, even at the cost of some tradeoffs.


By contrast, I found that my Facebook dump was extremely devoid of information. Sure, a few advertisers have my data (but much fewer than in the article, < 200!). Beyond that, the dump contains very little information that I didn’t explicitly provide to Facebook (such as Timeline posts, images, etc). At most it assigned a very broad category to me (“Established Adult Life” — OK, duh).

Of note, it did not contain any record of phone calls or SMS conversations I’ve had. This is probably because I didn’t give Facebook permission to collect this data, when asked.

Of course I’m under no illusion that this is the only data that Facebook has on me. Based on my (even sporadic) activity they must be able to infer lots more (and then there’s all that tracker data). They just don’t make the interesting bits downloadable.


Agreed; missing from the list is which entities have me in their list of contacts. This is of course available to Facebook. Of course they can't give me this data because they don't have consent from the other party, but still at least a nod to the missing data would be appropriate.


For a while I had enabled Messenger to handle texts on my phone, because it kept nagging me to do so, and also the one built in to the phone was terrible. So that means for a time they saw all my github/gmail authenticator login codes, delivery notifications, etc. I'm sure that's all somewhere.


I can't even imagine how terrible that phone app was that Facebook messenger was a viable alternative. The floating head thing was truly awful.


On a brighter note, I downloaded an archive of my LinkedIn data. The data set was less than half a megabyte and contained exactly what I had expected: spreadsheets of my LinkedIn contacts and information I had added to my profile.

Somehow I'm unwilling to believe that Linkedin is really an exception here. More likely they don't include all the data they have on you.

I downloaded my facebook data, and the only thing that I was surprised at is that they actually give you so much of the data they have on you. I expected them to redact most of it. Is it really reasonable to think that they don't have even more data than they let you download? They're not required by law to give it all to you, are they?


Linkedin is worse than Facebook in terms of privacy. So many dark patterns on their site - someday their reckoning will come just as facebooks. Easier target too since it could hurt people’s jobs and careers.


My old boss loved to play big brother on us with LinkedIn. She'd get notified when we were updating our stuff, and I think she could see when we were messaging with recruiters/etc. I'm not entirely sure what the limits were with her upgraded account, but she wasn't good at hiding that she knew when her employees were looking for new jobs.


I used to always update my LinkedIn a few weeks before performance evaluation meetings. Was hoping the HR would notice and recognize retention risk. They were still surprised when I eventually quit, ha.


Interesting how someone like that most likely was the biggest cause of people looking for jobs in the first place. I know I would.


Definitely the things Linkedin do are so much worse and creepier, just by looking at the people they recommend me to connect to, the level of offline tracking they must be doing are creepy beyond imagination.


Facebook at times appears to be doing something similar. After spending a day in a conference room with a colleague that normally works from a different location, all of a sudden he was at the top of my "suggested friends" list.


I had a great one like this with LinkedIn recently - a client called me to discuss something and mentioned a third-party; I hadn't heard of them so asked if the client could send me some information; he sent me an email later that day with a link to the other company's web site, which I duly clicked and browsed around. The very next day, in my LinkedIn "People you may know" in the top row was a lead developer at that very same (tiny, otherwise totally unconnected to me) company.


We used the same temporal colocation analysis to find ships smuggling things at sea. ;)


Who is "we?" :)


It was one of many DARPA projects to see what can be done with the table scraps of open/unclassified data available. The ship beaconing data was in that category. As was uber sample data, backpage data, darkweb ecommerce, blockchain ledgers. We looked at ALL kinds of data, and tried fusing where we could. One target was human trafficking. https://www.google.com/search?q=memex+human+trafficking


I couldn't delete multiple connections at a time. So I deleted my LinkedIn account instead. LinkedIn really serves no purpose if you are not in HR.


When Microsoft acquired LinkedIn I suspected they will go for the top talent with this. Think about it:

Based on users' data and behavior, one can anticipate whether or not he/she is going to quit/leave his/her current job. Microsoft recruiters can then swoop in and offer him/her a job.

Or for example, Microsoft can essentially map all the top IT talents network and use this to its advantage.

The list goes on...


I had forgot the MS acquired it! Honestly it’s probably less bad than it was. Maybe in a few years we’ll hear about all the scrubbing they had to do to keep MS safe.


We joke at the office about going to update our linkedin after terrible meetings. In fact the act of interacting with that site at all, either from updates or clicking emails, is probably a sign I'm looking for greener pastures.


I'm not sure it would be really helpful. Not all jobs (on LinkedIn) are IT jobs, far from that.

The real advantage would be to map everyone and every company to get intelligence on the users and potential users of their products (Office 365)


>Linkedin is worse than Facebook in terms of privacy.

And that sucks since it's hard to have a professional life w/o one. I've known women who were stalked/harassed because despite otherwise being pretty locked down their harassers found their linkedin and used that to harangue them.


It's honestly not that difficult. I dropped my account, it served no value. Getting regular messages "to pick my brain" is a waste


Facebook's reckoning never came, judging by the laughable "grand inquisition" that happened this past week.


> Is it really reasonable to think that they don't have even more data than they let you download?

No. They have more data. They have lots more data. Browsing history for example via trackers.

And no, we (Americans) do not have any meaningful laws. Nothing. Nor do we have any rights to privacy. We have no right to know what personal private information is being collected, bought, or sold.


As a European, the situation is only slightly better for me. Formally I have some rights, and some things that are legal in the US may be illegal here, but that has not kept any company from doing it anyway. I have yet to receive better privacy in real terms, and hope GDPR will finally make the mass intrusions unprofitable, maybe even criminal.

There was the Bye Bye Facebook event here yesterday, 10k people from the Netherlands deleted their Facebook accounts.

Personally I'm waiting until may 25 to do this: I hope that after that date, Facebook will be legally required to really delete my data, and be liable if they don't. I'm afraid that before that date, they'll just hide that data from me, and continue adding to it by tracking, and continue selling it to advertisers outside the Facebook site.


Facebook and others will be legally required to put in a reasonable effort to delete data they don't have any overriding obligation (public health, legal proceedings, other public interest, etc.) or business need to keep.

This might turn out to be different from what a random person on the street might expect.


Can we just change our info to say we're residing in some protected part of Europe? I've already gone in and falsified all of the good stuff, it wouldn't hurt to 'move' again if it means they have to apply an extra layer of privacy/right to forget.


Under GDPR, entities are allowed to ask you to prove your identity and by implication that you actually qualify for rights under GDPR.


Thanks for that tidbit. It would be nice to be grandfathered in, I can see it being a huge PITA to ask millions of people for proof--which probably gives them even more sensitive data like a photo of an ID. I assume your privacy laws travel with you, so if you travel to/login from the US your privacy rights stay the same?


What kind of grandfather clause do you imagine? The law basically says that companies are allowed to make you prove it's your data you're trying to delete to deter people using it to delete that of others. They're also allowed to refuse or charge a reasonable fee if they can justify that the request is spurious, unduly burdensome, onerous, etc.

Yes, but obviously you've got nothing if you use an US-based service from the US. It's based around companies subject to EU laws, rather than any real notion of citizenship.


My guess with all of these sites is that they are willing to give me the data I gave them, but not the data they inferred about me or pieced together from other sources. My concern is that this data may be more sensitive and hold more potential for harm than what I gave them (since I was willing to share it with them in the first place).


Exactly. The number of "This is all of the data Facebook/Google/etc have on you" articles I've seen recently make me want to bang my head on the wall. They absolutely are not letting you download everything they have.


Not just articles, but forums posts as well. I suspect the worst. But hope these authors are just "uninformed".


There was also a rapid and noticeable shift in "public opinion" on Reddit in regards to Facebook/Zuckerberg from overwhelmingly negative to slightly positive just in time for Zuckerberg's hearings. I also suspect the worst here.

I've also personally been receiving downvotes here on HN for posting undeniable facts about Facebook (and the data collection practices of other ad companies), but that could just be a result of the numerous FB employees on this site trying to pull the wool over their and our eyes on behalf of their employer.


I fear the worst here. We hear stories about the armies of people in other countries. We can feel the presence of mainland influences in these threads.


When it comes to Facebook, assume the worst and in 6 months find out it was 10x worse


Yep, if there's any takeaway from here is that companies will gladly take multiple avenues and channels to lie to you.

1. Not providing tools to see or download your data

2. Simply not providing options to turn off data collection or pretending that the options available are complete.

3. Providing tools to see and download a portion of your data, pretending that it is complete

4. Pretending to delete your data on account deletion but retaining the data, or at least a portion of the data

5. Collecting more data than they say they do

6. Collecting data on people who do not directly interact with the company services


Anecdotally, I had a very strange experience with LinkedIn. I went to stay with my friend in another country for a few days and upon returning home, LinkedIn recommended his father as a connection - despite me never electronically contacting him. The friend I stayed with doesn't have LinkedIn nor have I contacted him over email or similar. My guess is that it must have been location based or because I connected to their WiFi - very weird.


Yes, my guess is either public IP or GPS colocation from phone apps, possibly even using the wifi AP name if you both had phones connected. If civilian GPS is good to a few meters, it's easy to look up who lives in that radius.


exactly... do not take at face value what these services deliver to you. I chuckled at the guy who assumed he deleted all of his FB data via a data scrapping client..


> They're not required by law to give it all to you, are they?

They are in Europe: https://en.wikipedia.org/wiki/Max_Schrems


"They're not required by law to give it all to you, are they?"

Not sure about the law (which also changes from country to country) but even if they'd be required to give all users their data, it is possible that they keep a relevant slice of the results obtained by matching that data. Releasing all that data public could reveal the inner workings of their algorithms, so I'd understand if they would rather release what people gave them while keeping most of what they inferred from that information, which of course is where the real knowledge is.


In the UK, and possibly throughout the EU, they legally have to provide everything they have on you under the Data Protection Act for no more than a £10 administration fee.


I also downloaded my data and deleted my account.

I have known for a long time, even before the scandal, that my time on the socalled social media site was coming to an end.

Being a software engineer I believe I am aware of a good deal of what is going on behind the scenes at Facebook, in terms of collecting data etc.

Talking to people which has no technical insight around privacy, and what data is collected as such, frustrates me to a whole new level than before.

It amazes me that people don't sit down for a couple of hours and tries to understand how the company works and what their business model is based on. And what data is collected from them, with or without their knowledge.

I am happy to be off and I will try to convince my peers to understand why.


If it were only social media companies that might be significant, but every company we interact with is doing the same things these days. For example, did you know that employers are selling weekly paycheck info to credit bureaus without consent? Unbelievable it's not illegal.

For some strange reason Americans only care about privacy regarding medical info, but everything else is totally up for grabs. Never understood that dichotomy.


>More important, the pieces of data that I found objectionable, like the record of people I had unfriended, could not be removed from Facebook, either.

The article somewhat alludes to this practice as nefarious so that Facebook can serve better ads. Through a different view, one where you don't assume the worst, you could argue they keep a record of who you unfriend, so they don't recommend them as a friend again.

We are currently in a place where everything Facebook does is bad. Let's be a bit more rationale at times.


Sorry but I fear the horse has already bolted.

Even if you were able to delete yourself entirely from Facebook. You have been "consumed" already.

Your data was valuable in at least 2 ways: - Monetize to companies such as Cambridge Analytics. - Train Facebook's AI models using it.

What was probably happening, was that Facebook was selling your data to other companies so that they can train AI models/make conclusions etc. In addition, Facebook itself is using this data to train AI, make conclusions etc.

Facebook will no longer be able to Monetize to companies as much after this media.

However, and this is the big one, if they use your data to train their models to target individuals (and therefore "take advantage" of them), then they can claim that your data was not used. And then they sell the models, or the results of the models. Its like indirectly selling your data.

This could be a big win for Facebook, because if Facebook does not dish out our information to other companies, it will *uniquely" be placed to provide such services to the rest of the world.

And this will be almost impossible to regulate.


> then they sell the models, or the results of the models

Meh, the models will be worthless in a few months, and that is if they aren't already worthless right now.

All those black boxes with magic coefficients from inputs to outputs are inherently unmaintainable. Toss them a black swan, and it all goes pear-shaped.


Facebook doesn't and has never sold data. It has a free developer platform that has been reduced over time to almost nothing and it has ad products where it uses the data it has to deliver ads to people that match criteria.


Too bad this is only a small portion of what Facebook collects. For example, it doesn't include information they have purchased from third parties.


The terms of buying that data may prevent them from distributing it.


do they do that?


It appears they've been doing it since 2012: https://www.engadget.com/2016/12/30/facebook-buys-data-on-us...

Assuming this is up-to-date this tells you who they buy from: https://www.facebook.com/help/494750870625830


It's interesting to note that the data Google stored was more alarming, though was taken up at the bottom of the article and with less detail. It also didn't make the headline. This is because bashing Facebook gets more views currently. This is a good example of how media bias can distort opinion, while maintaining that all data stated is accurate.

In other words, it's not enough that media is accurate. Bias is just as important.


An additional concern is an attacker phishing a user’s email/Facebook password and downloading this dump to aid with identity theft or blackmail.


No, they downloaded the portion Facebook is willing to allow them to download. This is absolutely not everything Facebook has.

Note the wording here, which is similar to the misleading wording Zuckerberg used to avoid honestly answering certain questions in recent days:

> “allows people to see and take out all the information they’ve put into Facebook.”

This doesn't include data such as:

- Data your "friends" have "put into Facebook" about you, which could include SMS/call records of your communications with those friends and various other details.

- Data Facebook purchases or collects from data brokers or public records

- Data Facebook collects on the broader web via "like" buttons, etc


Exactly. I downloaded mine and there were entire chat histories missing, and some of the ones I explored were incomplete. I could see more data in the actual browser chat interface than what was in the download. It didn't include posts I made on other people pages, all the likes, shares, and the thousands of gifs I've sh!tposted. It also did not include data from the pages I manage/own.


These dumps of all the stuff that is in the service about you is super dangerous. If they get your account they can then download everything you have ever posted almost effortlessly. I can imagine these will be on sale in the dark web.


Missing from the list is any way to find out what some of these listed entities mean. For example, under "Advertisers who uploaded a contact list with your information" is claimed "Nebula Mars". What exactly is that? I can't google it and get any useful result. It both meaningless in its terminally symbolic state and deeply meaningful as an indicator of where my data is truly living in the world.


I left Facebook years ago for this kind of privacy scraping antisocial greed (and LinkedIn et al), but one thing I'm curious about with this latest 'revelation' is whether or not you can get information about your shadow profile.

Presumably that doesn't mean an email address, they actually collate information about you the individual - what's the process of having a look at that?


You downloaded the information Facebook shared with you when you asked for the information they have on you.

Tracking cookies, personal profiling, and several other trails would probably never be exposed.


I wish that I could download all the information that Google has on me, but I was stupid and I asked them to hide it from me...



I did this last night and the only surprising thing about the data they had were contact details from ~8 years ago.


Can the US government please make such a tool for foreign citizens?


If you believe Conspiracy Theories™ , and have a comfy tinfoil hat, this is it. There was something called Lifelog. https://en.wikipedia.org/wiki/DARPA_LifeLog


Can anyone explain to me why FB and Google surveillance of me does not violate the 4th amendment? "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated..."


There's a form of doublethink going around that says that, for instance, censorship is fine as long as it isn't the state doing it directly.

(Usually only when the sauce is for the goose. If for the gander, indignation ensues.)


Because Facebook is not a governmental agency for starters...


The amendment doesn't say anything about the government agencies. It says about the unreasonable search.


>The social network had even kept a permanent record of the roughly 100 people I had deleted from my friends list over the last 14 years, including my exes.

How useful is the info one can even get from this?

I mean even if you interpret unfriending as enemying (which is a leap), possibly revealing interests you don't have, associations you don't have.... it's a lot weaker (to advertisers) than positive information.


As mentioned elsewhere in this thread, it's probably so they don't recommend those connections in the future. I think the average user would be much more upset at facebook recommending they add their ex as a friend (because of course you have a lot of mutual connections, tagged pictures, and other flags that might indicate a strong bond) than just keeping a black list of people you've chosen to remove.


My bad; that's social media 101.


Which is odd because Facebook routinely has recommended ex's.


I had nearly 4500 'friends' I un-friended from back when I did Zynga MafiaWars botting. I wonder what I look like on their stats hahah




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: