Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
1 Database Containing 35,000,000 Google Profiles. Implications? (cyberwar.nl)
104 points by sathyabhat on May 25, 2011 | hide | past | favorite | 38 comments


A few months ago I discovered a curious book on my doorstep. To my shock, it contained names, address, and phone numbers of thousands of people in my area. I suspect that many other people, including criminals and certain types of marketeers are in possession of similar books. Implications?


>Implications?

Few, if any. The database delivered to you relies on security through obscurity, in that few people are even aware of its existence these days.


yeah I also think they're a bit paranoid, but its kinda different, the phone directory doesn't contain that kind of info (Emails, conversations, etc.), and doubt anyone would take the time to convert it to an SQL database (unless theres an API for the white pages online :D). Bottom line, we are digital merchandise, pf sounds very futuristic.


better yet, the people in this book don't act surprised that their public information is available for others to read.


Couldn't I tell the phone company to NOT include my profile in the phone book / directory they publish? I remember doing something like that with SBC - sons of you-know-what, they definitely sell out that info judging from number of crap calls I got when I was with them.


You can chose if you profile is indexable or not.


Not sure if this is still the case, but at one point you had to pay $1 a month for that privilege of having an unlisted number. Crazy!


I'm usually very critical of and sensitive to any privacy issue. But the Google profile is a public profile, which is made abundantly clear on every occasion. This is what you see when creating the profile:

"Decide what the world sees when it searches for you. Create a public profile to display the information you care about and make it easy for visitors to get to know you. [...] Your profile will be visible to anyone on the web, and anyone with your email address can discover it."

I fear that this kind of completely spurious criticism discredits anyone who has real privacy concerns.


If I read this correctly, Google lets you mark some of your profile information as public. And as a result of this, a member of the public was able to download it.

So, uh... what exactly is the story?

I think the key piece of advice for people not wanting their personal information to be downloadable from the internet is to not publish their personal information on the internet.


> I did NOT publish the database and did NOT violate any Google policy.

But he might have broken some EU and NL laws about privacy. You can't create a database with personal information without consent even if it's possible.


The database was created by Google and the users who typed in their information. He just made a copy.


Does not make it legal. Even if the original provider got consent from the user, it doesn't mean you have the right to copy the database (and that you shouldn't declare the data collection to the relevant privacy agencies).


So operating any web index in Europe is illegal?


A web index is not the same thing as a database about people. The original blog post explicitly says he built such database, he didn't just mirrored Google's data.

It's not illegal if you get the necessary permissions from the privacy agencies (which will ask things like: how is the data stored, do you do join with other databases, can a user ask to have its information removed, etc.).

(IANAL I just happen to have dealt with that kind of things when building a lobbyist database out of public documents for an advocacy group)

Edit: removed part about database rights, lets not complicate the subject.


It's an interesting question actually; I guess at some point, if there is sufficiently advanced AI (I should use the word 'information retrieval' to not instigate a tangential discussion on if AI is possible) in a search engine to identify and link personal information, does a search index constitute a 'database' per Directive 95/46/EC?

The author of the original article could write a paper on it :)


Every search engine does this, I'm not sure what the implications are if it's public?

Data mining is quite possible but there's an expectation that the profiles are public so no one privacy-conscious will be putting sensitive information in it.

I'm curious to know if anyone supposedly have ways of restricting 'mass-downloading'? I don't know of any website that does short of rate-limiting requests from a single source.


There is rate limiting and also some regularity testing, e.g., one site would only let me download if I scheduled randomly and less than a certain frequency.


It is meant to be public and available to anyone who tries to access it. It has nothing to do with privacy.

Don't do your paper just to do it. Go and find more real/serious stuff.


If the information is marked public then crawling it is how the web works, or at least how searching and indexing works.

I ran into that a bit with our startup (http://infostripe.com) when doing demo's it was sometimes shocking to people that with a bit of searching I was able to make a complete profile of their public online activities.

I think that even when people know a particular site is public on it's own they sometimes don't make the connection between software and search engines aggregating all that together without their involvement. Usually this is not a problem for most people but I have seen instances where a user would use the same username on very different services and get burned for it.


Implications? Maybe he'll make a Google Profile social network before Google does.


OMG!!! My public data is public.


I contacted Google about this issue in November of 2008 - I only received an automated response. (Matthijs mentioned that was why he posted the previous post[1] on the topic prematurely)

Perhaps with the increasing awareness of this issue, Google will be forced to act.

[1] http://blog.cyberwar.nl/2011/05/google-profiles-exposes-mill...


Why act, they make great effort to explain that the data is public. You can even chose if you want to be indexable (search visibility).


I have one Google username that has a public profile and that I use for account registration etc. I have another that I use for personal email that is private.

I assume more people will start doing the same if they are privacy conscious.

Searching this database is no different to searching on Google itself. The only concern would be having a mass email list, but spammers have had those for years and filters sort that out.


Here's one implication: a scammer decides to send a "Your Gmail account is being canceled" phishing email to every address there. It clicks through a to fake but convincing Gmail login page that captures the user's real login info.

I've already had a few friends call for help with this since apparently it's pretty common.



It's not as helpful as you would think. The people who would activate the second step sign in and the people who fall for the phishing scheme don't overlap that much.


If these are public profiles then maybe this isn't a problem, but if the data contains non-public profiles then its a security breech for Google. The robots.txt settings would lead me to believe that these are public profile and that Google intends people to view/download them.


Public profiles can be automatically harvested? Curl and wget should be classified as munitions and access to those tools restricted in at least 45 states. Shut. Down. Everything.



I would love it if it were made publicly searchable available so I can see what data is available on me personally.


I tried to find out with your name and came up with this: 'Google launches Google Xistence to manage social media life' [0]

Your information is pretty easy to find out: https://groups.google.com/groups/profile?enc_user=V5aPoREAAA...

[0] http://www.webmarketinggroup.co.uk/News/google-launches-goog...


Never seen [0] before, or heard about it.

The second is just one of my email addresses, and isn't my Google Profile.


old but still working: http://www.facesaerch.com/google-profile/ inofficial google profile search start page


What? Your Google Profile is available (Note, yes, this is actually what the article is talking about. Big whoop):

https://profiles.google.com/u/0/<username>/

All of your Google Data is laid out on the dashboard for your viewing and deleting pleasure:

http://www.google.com/dashboard


Yes I know my Google Profile is publicly available, however that doesn't sound like it is the interesting part. The interesting part is the data that he was able to get surrounding that. What does his data show regarding the 6 degrees of separation? Stuff like that.


isn't it up to the user to publish or not his profile?


My profile is marked as public. I expect that it would be available were someone to try to access it, whether it was a friend, whether it was someone scoping out my class(mates), or whether it was someone downloading by the thousands. What's the difference to the user? The whole point is that if my data is public, other people will see it. How is it important if my profile is visible locally alongside other profiles?

I also don't know what people expect people to do. If you ignore the easily available privacy policy, there is no excuse. Period.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: