Ask HN: Who really gives your personal info to Intelius, Instant Checkmate, etc?

monodeldiablo · on March 7, 2017

I've had the misfortune to be present for an in-person demo of a verification service nearly a decade ago. It involved those relationship questions ("Which one of the following people have you not lived with in the last 5 years?") that are incredibly creepy.

I was shocked that they had so much data on me -- I have no debt, no credit cards, no house, no car, no bills, and I had always entered informal rent agreements (I was poor) up to that point -- yet the rep was easily able to list all the places I had resided from college to present, along with a host of off-the-books housemates.

"Where the fsck did you get all this?!" I demanded.

"Have you ever ordered a pizza?"

Turns out, some fast food chains do a brisk business in reselling customer data. I had ordered from Domino's once, but that was enough to link my name to a specific location.

This experience has made me extremely sensitive about the information I give while making a purchase. When I lived in the US, I stopped having food delivered, paid in cash, and never signed up for branded credit cards. I rented informally and, whenever possible, tried to just pay the landlord my share of utilities in cash. Anything to keep a lower profile.

Far more companies than you realize are collecting as much data as possible about you, your habits, and your relations. So I'm afraid your search for a canonical list of data sources is ultimately fruitless. In this new economy, you are always the product.

ams6110 · on March 7, 2017

Do we forget that not long ago everyone's name, address, and phone number was published in the phone book (unless you paid extra for an unlisted number), and hospital admissions were published in the daily newspaper. It used to be unremarkable, and now people shriek "privacy!!" when it's discovered that some mundane detail about their life is not a closely held secret.

intended · on March 7, 2017

No we don't forget.

Those systems were analog and could only be scaled to a certain extent, after which you ran into management overhead.

So in the example given by the OP, dominoes would not be updating your name in the directory.

That would be data that evaporated and never made it to record.

To increase the contrast - whole new types of clustering and analysis are now possible, at the moment a new data point is received.

tankenmate · on March 7, 2017

Indeed, the amount and types of published information hasn't broadly changed, what has changed is the information horizon; the ability to search for information and analyse it. The world hasn't become smaller, our ability to see is much much wider. The information horizon is much further away from zero than it used to be (thanks to the computer's / big data's ability to communicate and analyse), and it's impact on society is much more than just privacy.

revelation · on March 7, 2017

This reminds me of the decision on GPS trackers on cars: https://en.wikipedia.org/wiki/United_States_v._Jones_(2012)

Dreeben cited United States v. Knotts as an example where police were allowed to use a device known as a "beeper" that allows the tracking of a car from a short distance away. Chief Justice Roberts distinguished the current case from Knotts, saying that using a beeper still took "a lot of work" whereas a GPS device allows the police to "sit back in the station ... and push a button whenever they want to find out where the car is."

amichal · on March 7, 2017

i once worked on a project (about 15 years ago) where we had access to a reverse phone book listing that was generated in part by having a copy of every distinct paper phonebook in the US (also things like school directories) sent overseas to be hand entered (2-3 times for QC). A "nice" feature of it was that the address info was also geocoded so we could answer questions like "Who are my closest 100 neighbors and what are their home phone numbers?". Between that and Google buying the archives of usenet[1] in the same basic timeframe I realized that if it's written down its not likely to stay private for long.

Edit: add citation [1] https://en.wikipedia.org/wiki/Google_Groups#Deja_News

dredmorbius · on March 7, 2017

The phonebook was a paper-based system, with significant costs to look up and act on information at scale. You needed the books, the researchers, and the time to find and act on specific numbers.

The numbers weren't widely cross-referenced across multiple other identifier databases: shopping, driving, voting, location to 1m accuracy and 30s time precision.

Every household, small business, mafioso, or political operative didn't have a full copy of the archive, and the ability to deploy it at a moment's notice.

Or in short: scale and costs matter. In fact they dominate all other effects.

Your objection is both meaningless and betrays a profound failure of undrestanding and sympathy.

monodeldiablo · on March 7, 2017

"Do we forget that not long ago everyone's name, address, and phone number was published in the phone book [...]"

The phone book is an opt-in system, with the ability to remain "unlisted". There's absolutely no such choice, though, from these modern, private data collectors. And they're everywhere.

Consider the following hypothetical: A battered wife flees her husband and takes shelter in a shared house run by a women's organization.

In your phone book world, she didn't need to dramatically alter her existence to stay safe. She could continue to maintain a reasonable subset of her professional and social relationships. She could get a new job in a new city. She could buy groceries. She could pay her own electricity bill. And she definitely didn't need to take her life in her hands to order a cheese pizza.

In this brave new world, though, her ex can track her with the utmost ease. She has to shed nearly every piece of her modern identity -- reconfigure apps, wipe her phone/laptop/tablet, stop using sites that geolocate via IP, and halt nearly all interactions with the modern economy -- at significant social cost.

The sheer number and diversity of data collection points -- and their increasing necessity to participate in the digital economy -- makes opting out orders-of-magnitude more difficult. Yours is a false comparison.

maldeh · on March 7, 2017

It's likely the locality / ephemerality / repudiability of those identifiers made them less of a big deal in an era when people couldn't instantly look these details up or build a detailed profile of everybody given the right data sources.

dredmorbius · on March 7, 2017

Brad de Long addressed this a while back.

https://news.ycombinator.com/item?id=13734768

mifreewil · on March 7, 2017

Wait hospital admissions? Really, why?

lucb1e · on March 7, 2017

I imagine your friends didn't tweet or message a Telegram group about having had an accident back in the day where that was common (whenever that was, I've never seen it). When the telegraph is the fastest method of communication and landline phones are expensive, you might not learn about a friend being in the hospital until word of mouth reaches you.

Not saying this is the one true answer, but I don't find it hard to imagine why.

dredmorbius · on March 7, 2017

Temporary postal mail forewards.

USPS NCOA (Change of Address) file is another major data source.

https://www.forbes.com/sites/adamtanner/2013/07/08/how-the-p...

stretchwithme · on March 7, 2017

What makes me crazy about having your mail forwarded is that they often don't even forward it to your new address. I know because I once lived with a crazy person and had to move out and they just kept delivering all my mail there despite duly submitting my forwarding address and stressing the importance of it.

I still get mail for the last folks who lived here and they moved out 13 years ago.

But anybody willing to pay the post office for the data can find out your new address.

And then there's how we pay higher rates for first class mail so that junk mail can be cheaper.

It's almost like the government doesn't work for the people.

bogomipz · on March 7, 2017

Thanks for the tip, from the link:

"There is, however, a loophole that keeps data brokers from accessing your updated address. When you fill out the online form to change an address, you can indicate a temporary change that provides six months of forwarding that can then be extended for another six months. That information, unlike the changes marked as permanent, is not included in the master list sold to data brokers."

And the online link for online change of address with the loophole:

https://moversguide.usps.com/icoa/home/icoa-main-flow.do?exe...

stretchwithme · on March 7, 2017

Yes, but will they fail to forward the mail as reliably as they fail to forward permanently forwarded mail?

dredmorbius · on March 7, 2017

Correct.

(I've mentioned the temporary forwards tip elsewhere in the thread myself.)

asperous · on March 7, 2017

Their privacy policy says they don't do this:

https://www.dominos.com/en/#/content/privacy/

mysticmarvel · on March 7, 2017

That link at the bottom of the email states that they unsubscribe you from the mailing list.

Odd how I always seem to get more spam, but from different companies, after I click one. Almost as though all I've done is confirm that the email address actually exists, and is therefore more valuable to sell on to others.

monodeldiablo · on March 7, 2017

Perhaps the policy is new. This was about a decade ago.

Regardless, that data is out there now, and no change in privacy policy is going to put that cat back into the bag. Besides, someone else is surely acting as a new source for current data.

It wasn't just my location information, though. They were able to ask detailed questions about my personal social network, before Facebook was ubiquitous and LinkedIn was a thing. They were clearly joining several disparate datasets together to discovery relationships.

dkersten · on March 16, 2017

You might enjoy the book "how to be invisible" by J. J. Luna

stevedekorte · on March 7, 2017

Good to know. Let's all start ordering pizza's, etc under random other names. Create a haystack to hide the needle.

smhost · on March 7, 2017

A better strat would be if we all agreed on a single name for everyone to use.

goombastic · on March 7, 2017

I propose Null or Drop Table.

spoiler · on March 7, 2017

Why not both? :)

https://xkcd.com/327/

gcb0 · on March 7, 2017

easy to filter out

cema · on March 7, 2017

This would not work, the data is heavily cross-referenced. You would simply have more names (as possible neighbors, or rather co-habitators) in your account.

pandabear187 · on March 6, 2017

I have insider knowledge as I used to work for one of these companies.

Depending on the product you purchase the data comes from multiple sources. Also these companies have sophisticated machine learning capabilities to build a profile based on various attributes found in seemingly unrelated pieces of data.

So the list consists of credit reporting agencies, public records, your online profiles with public access, court records, aggregators like LexisNexis and dozens like them.

This heavy lifting you speak of is done differently by each company and consists literally based on multiple sources to enrich your profile. These companies spend millions on data and engineering and make even more, and whatever preconceived notion you have about courts ordering to seal your records, it doesn't happen in a centralized fashion, you would need to contact each data vendor individually to be removed. But it would be like playing whackamole.

dataflow · on March 6, 2017

Thank you for the reply. Can you actually give some kind of a list though? The entire problem here is everyone explains the how but no one is willing to explain the who. I certainly understand it's "multiple sources", I'm asking who are these sources I keep hearing about. There can't be nearly as many sources as there are sites who buy from them. If you'd like to not name the company you worked for yourself then could you at least please list as many other ones as possible? That would be far, far more helpful than just saying they use machine learning and that they use multiple sources, etc.

VLM · on March 6, 2017

I can give you one very specific answer in that my late father decades ago worked in IT for a collections agency and one way that biz works is they pay a fixed amount to buy some companies 120+ day accounts receivable file (the feds and every state highly regulates this and its highly variable and complicated, but this is the simplified version...) and this gave them a vast pile of records and legal ability to collect the debt. Now obviously they got most of their revenue by annoying the heck out of debtors right up to the legal limit. But they're leaving money on the table if they don't resell all the data they can legally sell. So sure, even 20+ years ago collections agencies were uploading records to various data aggregators in exchange for a check. Not just the original debt but followup activity and intel WRT the collections process. I would imagine that's only increased over the last couple decades. Most of these data transfers were two way streets, not simple data for money trade, and obviously they maximized profits by leaning the hardest on the people most likely to actually pay up, they did not waste time on debtors known to other collections agencies as hopeless deadbeats.

blackbagboys · on March 6, 2017

There is no such public list. Most of these companies are privately held, their methods are trade secrets, and there is no real form of legal recourse. Your best bet would be to buy a book like this: https://www.amazon.com/Hiding-Internet-Eliminating-Personal-... and follow the recommendations, but even that is only likely to be half-effective.

Much of the public information is mined from sources like credit headers, your court records, utility bills, property and tax assessment records, voter registration lists, motor vehicle registrations, etc.

Unfortunately, the legal and technological landscape is such that 'hiding' from these kinds of services is effectively impossible.

sbov · on March 6, 2017

I have a friend who is super paranoid about privacy stuff. He was even able to get his utility bills to use a PO box rather than his home address, which he said was extremely difficult.

protomyth · on March 7, 2017

Depends where you are in the country. Many rural places don't have home delivery and only have P.O. Boxes. Home delivery from Amazon, etc. requires a bit of work.

dataflow · on March 6, 2017

> There is no such public list. Most of these companies are privately held, their methods are trade secrets, and there is no real form of legal recourse.

How can there not be a public list of these data miners? When e.g. a court needs to control someone's information surely they know who these people are and they can let them know? Is there a secret list in every courthouse or something?

Or when someone wants to start another one of the higher-level companies -- how do they know which core aggregators to buy from? If that's a secret then how would they find out? Surely someone's gotta be willing to tell?

dsp1234 · on March 6, 2017

a court needs to control someone's information surely they know who these people are and they can let them know?

I think you have a misunderstanding of what a US court can do. A court can only tell a specific party to take some action, and generally only if that party is somehow related to the legal action (such as being a defendant). Generally, there is no judgement that a court can make that can effect unnamed parties (unless they are John Does, which later have to be named).

Theoretically, you could sue each company with your data, and a court could tell each of those companies to remove your information. But it would have to be for each one, and the judgement is only binding on those companies.

Surely someone's gotta be willing to tell?

The techniques used are generally trade secrets, and amount to competitive advantage. There is little incentive for a company to reveal this information (or for an employee to do so, and thus open themselves up to legal liability).

dataflow · on March 6, 2017

What about this though:

>> Or when someone wants to start another one of the higher-level companies -- how do they know which core aggregators to buy from? If that's a secret then how would they find out?

cosmie · on March 6, 2017

   > If that's a secret then how would they find out?

You don't. I've worked in data acquisition in the past, both buying data and selling it. Sometimes as the original source of truth and sometimes as a middleman that does data cleaning, standardization, appending (from other sources), then selling the derived product downstream.

Companies in that space guard their upstream sources quite heavily, because they don't want to be cut out of the process. You won't find a centralized list of independent data feeds and providers specifically because of that. In one scenario, we were dealing with a substantial rate increase from one supplier. We spent time attempting to source an alternate supplier of that particular type of data, and could only find sources that were several months more stale than we were currently getting (i.e. these people were getting the feed several hops after we were). In the end we paid the rate increase because we couldn't find an alternate source that was as close to the original data provider as our current source. And without knowing who the original data provider was, we couldn't go around our supplier.

The lack of a centralized directory isn't just done to make things opaque for end users, it's done to make things opaque for business competitors as well. It's an industry that's very, very reliant on networking and introductions.

Edited to add: You're also asking a lot of people in here to name specific companies even if they can't give you huge lists. This space is super heavy on NDAs (and trigger happy on enforcing them). If you've actually worked in it, there's simply no way you're able to name drop legally.

dataflow · on March 6, 2017

+1 thanks for the explanation!

And regarding this:

> Edited to add: You're also asking a lot of people in here to name specific companies even if they can't give you huge lists. This space is super heavy on NDAs (and trigger happy on enforcing them). If you've actually worked in it, there's simply no way you're able to name drop legally.

I understand that an NDA would prevent you from naming your own company or your suppliers and clients, but surely it doesn't prevent you from listing some other companies in this space that you know of (including but not limited to your competitors)? I don't understand why you shouldn't be able to name any company just because you've worked at one of them.

cosmie · on March 6, 2017

Every company I know in the space is a company that we've had at least preliminary conversations with (seeing if there's any potential relationship to either purchase from or sell to that company).

Just having that conversation required getting a mutual NDA in place, since the conversation involves revealing your capabilities (even if not your sources). And that's assuming you're even aware of all the NDAs your company has signed with other companies, which isn't always the case. Speculation or name dropping in public could violate an NDA you're not even aware of, then you find yourself having to defend your speculation as just that, rather than as revealing proprietary knowledge (that you didn't actually have but your company did).

At the end of the day, it's easier to default to speaking in generalizations rather than risk the potential repercussions of not doing that. :-/

justin66 · on March 7, 2017

> Just having that conversation required getting a mutual NDA in place, since the conversation involves revealing your capabilities (even if not your sources). And that's assuming you're even aware of all the NDAs your company has signed with other companies, which isn't always the case.

Interesting. Eight years ago I worked as a buyer for the aggregator with the largest criminal database and I don't remember having to sign an NDA during those sorts of talks. It's possible I've forgotten, but I think it was more a matter of people wanting to know our coverage as much as we wanted to know theirs.

As an aside, the value of those talks wasn't usually in acquiring the data, except maybe in the short term. We always preferred to go directly to the source. The value was simply in learning that the data from a specific source was even available. In a few instances, that got pretty frustrating, since I knew that the seller of the data was scraping a given court's website against that court's wishes (and the TOS on the website), with all associated problems with accuracy and ethics that entails.

dataflow · on March 6, 2017

Okay, I see. But somehow someone in your company found out about them before they could sign mutual NDAs, right? How does that happen? Do you just need a higher-up who's friends with the right people?

lithos · on March 7, 2017

Imagine the legal liability you'd incur if the layperson could track the source of inaccurate data, then proved in court it kept them from getting a loan.

Those are provable damages, and maybe slander if they can convince the court they're a publication

Another reason to be NDAed up.

eonw · on March 7, 2017

i imagine its similar to how drugs and weapons dealers meet. when you know seedy people, you get connected. birds of a feather and all that.

justin66 · on March 7, 2017

Go to an industry site like www.napbs.com and look around.

fragmede · on March 6, 2017

> Theoretically, you could sue each company with your data, and a court could tell each of those companies to remove your information. But it would have to be for each one, and the judgement is only binding on those companies.

As an individual, that sounds like a lot of work for little gain, but if there were pre-filled out forms, and all I had to do was put in my name, I'd be willing to file lawsuits to get my name removed.

literallycancer · on March 7, 2017

>Theoretically, you could sue each company with your data, and a court could tell each of those companies to remove your information. But it would have to be for each one, and the judgement is only binding on those companies.

I'd expect going after such companies to be the state prosecutor's job.

kazagistar · on March 7, 2017

If there are many companies, surely it would be in their interests to leak one another for competitive advantage?

TrinaryWorksToo · on March 6, 2017

What about class action? Could you make a class action against those who have sold your information?

dsp1234 · on March 6, 2017

A class action is an action where there are a large, possibly unknown, number of plaintiffs suing a single group of named defendants. With the thought process being that if a single case against a named defendant is won, then that same outcome would happen for every other plaintiff, and thus doing it once saves the court time.

What you're suggesting is more like a John Doe case where you sue a number of unknown entities, and that can be won, but at some point, the plaintiff has to name the John Does, so that they can defend themselves.

logfromblammo · on March 7, 2017

Couldn't you start a class action against one named defendant, and use the discovery process to uncover all the other unknown-at-the-time-of-filing defendants?

I'm pretty sure that if the named defendant coughs up an NDA that prevents them from disclosing the names of their business associates to a court, the judge is not just going to say, "I'll allow it."

justin66 · on March 7, 2017

It has successfully been done in the past, so yes, you could do that.

pc86 · on March 6, 2017

You grossly underestimate the scope and power of the United States judicial system, which does not ever have a need or desire to "control someone's information."

justin66 · on March 7, 2017

> When e.g. a court needs to control someone's information surely they know who these people are and they can let them know?

The companies which furnish personal data aggregated from courts are legally required to stay on top of out-of-date records and purge records which are inaccurate. (it would be completely untenable for things to work the other way, with every court and agency which made records available being required to reach out to every recipient of the data. For one thing, in some cases the data can legally be resold.) They can be held civilly liable for distributing false information and might also be in breach of their agreements with the agencies which give them access to the data.

The urgency with which this is required under the law depends on the use to which the data is being put. It's really important to keep data used in pre-employment reports up to date. Marketing data can generally be full of garbage.

This is all generally governed in the United States under the FCRA. It's not an area of the law that you're going to get comfortably familiar with in just an afternoon of reading.

_skel · on March 6, 2017

> When e.g. a court needs to control someone's information surely they know who these people are and they can let them know? Is there a secret list in every courthouse or something?

The courts don't control information in that way.

dataflow · on March 6, 2017

That's surprising. So what happens when there is a high profile case and the court fears for the jurors' lives? Or the case itself involves someone whose life will be in danger afterwards? If anybody can find out where these people live then they're toast. The courts have to suppress the information legally somehow, right? If not the old information, at least they need some protection against mining of new information after the subjects move or change their names, no?

Are you literally saying they have no way to order all the first-level companies to stop sharing data on someone?

blackbagboys · on March 6, 2017

> Are you literally saying they have no way to order all the first-level companies to stop sharing data on someone?

This is exactly what he's saying. There is no central node of control for this kind of information. You are operating from entirely unjustified assumptions.

pc86 · on March 6, 2017

> So what happens when there is a high profile case and the court fears for the jurors' lives?

Jurors' identities are not generally a secret.[0] There are exceptions, but those exceptions do not extend to wiping that person's data from things like pharmacy and gas station reward card databases.

Honestly your entire premise shows a lack of understanding of how the criminal justice system works.

> Are you literally saying they have no way to order all the first-level companies to stop sharing data on someone?

Sue all of them individually, win each case, and have them ordered to stop collecting data on you. Something tell me this is tantamount to "don't use a computer or a credit card. Ever."

[0] http://www.legalmatch.com/law-library/article/public-access-...

dataflow · on March 6, 2017

> Jurors' identities are not generally a secret.[0] There are exceptions

I was talking about the exceptions. If there are exceptions, a way to handle them must exist, is all I was saying.

> but those exceptions do not extend to wiping that person's data from things like pharmacy and gas station reward card databases.

I was asking about the companies who obtain this original information, not pharmacies' or gas stations' databases themselves. I feel like you're not understanding my question?

> Honestly your entire premise shows a lack of understanding of how the criminal justice system works.

Quiet likely (I'm not claiming otherwise; I'm not a lawyer and I haven't exactly been involved in legal proceedings) and I didn't claim otherwise. Also hardly undermines my point. Like I've said in some 3-4 other comments, someone who e.g. starts a new company like InstantCheckmate has to know whom to buy the data from -- like I said, there's no way all of these companies contact all grocery stores and all doctors. That's insane. Someone's gotta be doing the heavy lifting and making money off it. I'm asking who this is. I asked this in the original post. If the court example is wrong or otherwise bothers you just ignore it. If someone not already involved in the business knows to contact these companies to obtain information, someone must know who they are, is all I'm saying. Otherwise they would not exist.

ceejayoz · on March 7, 2017

> I was talking about the exceptions. If there are exceptions, a way to handle them must exist, is all I was saying.

There is. They put them in a hotel, under police guard, for the duration of the trial. https://en.wikipedia.org/wiki/Jury_sequestration

literallycancer · on March 7, 2017

What if I told you there are countries where you can use a credit card, and still have reasonable expectations about privacy, where the information gatherer must erase personal information on demand and bears the burden of erasing it and notifying everyone they sold the information to erase it too?

eonw · on March 7, 2017

id like to know which ones? id consider relocating.

pc86 · on March 7, 2017

1. They don't know, but assume some Western European country fits that description.

2. They misunderstand German privacy laws.

cortesoft · on March 6, 2017

For very high profile cases, a jury would be sequestered, meaning they are housed somewhere and not allowed to interact with the public. If they were in danger, their would be guards protecting them.

After the trial is over, though, they are on their own. They will not continue to be protected, and could certainly suffer retaliation. It sucks, yes.

http://criminal.lawyers.com/criminal-law-basics/sequester-is...

dqv · on March 7, 2017

If I was a company, I would be writing scraping engines that scraped

* police 2 citizen (a platform many counties and municipalities use to report crime and accidents to the public)

* any public facing dataworks plus web application (or whatever various other municipalities/counties are running): the one for the county I live in lists the arrestee's employer

* district level and state level court dockets

* real estate records, which also link up to tax bills

* public voter records

Not surprisingly, only the information I've ever listed on my voter ID has ever showed up in Intelius/LexisNexis databases.

pandabear187 · on March 6, 2017

I can provide all the sources used for the company I used to work for, however that's irrelevant if you as you said wanted to learn how to fish.

Any and every company selling data and there are thousands is in fact a source you would need to deal with to be removed from the sites that sell it. If you think this answer isn't specific enough, you're not going to achieve anything by me spoonfeeding you info for one such company.

samstave · on March 6, 2017

>I can provide all the sources used for the company I used to work for, however that's irrelevant if you as you said wanted to learn how to fish.

Personally, I think this should be a requirement of the government: "HERE IS A LIST OF ALL PEOPLE THAT ARE COLECTING AND SELLING YOUR PERSONAL DATA, CLICK THIS BUTTON TO DELETE YOUR RECORDS" sort of thing... it should be a mandated public service of regulation.

Then, they have a list of all opt-outs from which companies you may have selected to opt-out from, and you can simply send them any further contact from the opted-out companies you may receive and the company gets a fine, you get a compensation fee...

kalleboo · on March 7, 2017

It used to be like this in many European states back in the 80's. If you wanted to keep a computerized database of personal information, you had to apply to the state for a permit, so there was central control of all the lists. Of course this was extremely difficult to enforce... Eventually that type of legislation was replaced by the EU data directive.

e.g. In Sweden: http://www.datainspektionen.se/om-oss/historik/

pandabear187 · on March 6, 2017

2 things, 1.)if you make government enforce this, then we are all going to be paying more and things taking longer. This will push the society toward a fascist or communist states where the government is the all powerful and might beast. I do not like this.

2.) in western countries there are already laws that allow you as the consumer to remove any personally identifiable information from being shared. The ownus is on YOU to go do that, it's not hard but will require to track and follow up every year for ever. However you also have a choice of companies that you provide the information to, and what you provide. Any credit agency when you apply for credit asks if you would allow them to resell this. Opt out!

The fact that in most cases people are to lazy to read and understand what they are provinding and for what purpose is the real issue here. Government is not going to make things better or more secure or even be able to enforce this type of governance. Only you and the lawyers can do this.

bradleyankrom · on March 6, 2017

I don't think it's fair to call people lazy for not reading every line of an exhaustingly-long TOS. Also, it isn't fair to expect every consumer to fully understand the legalese that they are reading if they do decide to read it end to end. "Don't click 'I Agree' if you don't know what you're agreeing to!" is unrealistic.

pjc50 · on March 6, 2017

This is sort of how European data protection is supposed to work, although you have to maintain your own list as there is no central one. And the enforcement is intermittent.

(Data subject requests are a very powerful tool)

tracker1 · on March 7, 2017

If you ever buy anything with a CC, use a "membership" card anywhere, or have anything delivered, or for that matter purchased online, then there's a greater chance than not this data was shared by the merchant, transaction provider and/or the credit card company themselves.

If you have an online profile, with any friends that aren't paranoid, and allow your friends to see any private information, then this can be collected/correlated by the various bot farms.

justin66 · on March 7, 2017

> There can't be nearly as many sources as there are sites who buy from them.

Sure there can. The sources include state, county, and local governments. There are a lot of those.

deegles · on March 6, 2017

As a person who used to be involved in that industry, do you try to protect your data more? Or is it just a losing battle?

pandabear187 · on March 6, 2017

Yes, I protect the shit out of my personal info. I have introduced a seeding mechanism to figure out who is sharing my data whenever I sign up for things. Public records are different and you have to create a Trust or a company to hide behind to not be personally listed.

I would suggest anyone wanting to not be exposed to never use your real name or address online if you're not confident they won't share it. Introducing typos or initials or use a nickname for online orders that allow to list billing address separate from shipping address. Billing data typically has a better chance of not being resold, but there are no guarantees, read their TOS if not sure.

samstave · on March 6, 2017

I really think you should do a how-to / AMA

Financial ignorance is an exceedingly exploited issue in the US, and I do not think anyone is doing anything about this.

pushECX · on March 7, 2017

Would you be willing to write up a guide of some sort about the steps that someone could take to protect their information? Personally, I would be willing to pay for something like that.

literallycancer · on March 6, 2017

Are any of these companies operating in Europe and/or with mostly European data?

JoshTriplett · on March 6, 2017

If anyone reading this thread is interested: I would pay non-trivial amounts of money on a regular basis for a service that systematically worked to eliminate records like these (and the sources they draw from), as well as chasing down sources of junk mail and the lists they ultimately draw from.

The value would depend on effectiveness, and on the degree to which the service clearly reported exactly what they did. Calling and unsubscribing from sources of junk mail would be a moderate time-saver, but finding out where they got their names and addresses from and destroying those would be far more valuable.

It'd take some optimization and batching of the process to figure out how to avoid taking an excessive amount of time per person.

jameslk · on March 6, 2017

I've considered working on this problem from a business standpoint, but I couldn't figure out a good business model for it. I don't think too many people will pay a monthly fee to have their information removed from these services. My guess is that they would sign up for a month and after their information has been removed, immediately cancel their subscription until they needed to do it again. And a yearly fee seemed like it would cost too much for mass adoption.

There's also the problem that you'll often need to get the customer to "opt out" by providing their own information to verify they own it or they will need to click on a link from an email or receive an SMS text verification code. This gets really messy as an automated service.

web007 · on March 6, 2017

I used to work for https://www.reputationdefender.com/privacy - that is (was) exactly their business model. One of the big selling points was the time savings, where it would take you 80 hours a week or some crazy number if you did it yourself, filling out forms and keeping up with all of the new services that aggregate and sell this stuff. Doing it once is great, but won't help you after that month - all of the same places that initially sent all your data in to the aggregators are just going to do it again next month, so you'll get a new record all over again.

They have largely pivoted since then, into a service primarily for reviews and feedback management. I don't have any insight into the quality of the existing service on either side.

JoshTriplett · on March 6, 2017

It's possible that the business could be so successful that everyone uses it, the services selling this information all run out of customers and go out of business, none of them come up with newer and more evil ways to do this, and you run out of potential customers. In which case: mission accomplished, retire on your giant pile of money and bask in the knowledge that you made a far better place. (Avoid scenarios in which you have perverse incentives to allow the problem to continue.)

But in the meantime, tens of millions of potential customers times any reasonable fee seems more than enough to build a substantial business on.

You could tempt people in with a cheap fee to let them send in a few pictures of junk mail and stop those. (As you get more, find the biggest sources and automate or batch them so that they cost you almost nothing, which will pay for the higher-effort ones. Have an upper bound on effort expended, and tell people that they don't pay if you can't remove them.) You could then track down the underlying sources, and if you successfully identify them, contact the customer, and give them enough information to decide to pay you for a higher-end service to get them removed from those sources (and keep them removed).

The value that gets people to keep paying you would be a steady stream of reports of "we found this source leaking/selling your information, here's what we did about it". It'll take you years to track down all such sources and find paths to remove them; you will likely end up having to fund some legal work and possibly even a lawsuit or two, which will give you a giant pile of publicity.

(As one example of something much easier for a company optimized for the process to do than an individual: the USPS has a detailed process for formally putting a company on notice for mailing someone who has specifically unsubscribed, and that process ends in massive fines for continued mailing to that person. I read a report of someone doing that to stop receiving persistent Dell catalogs.)

If you're sufficiently creative, you could even pitch this as a service to marketing companies. You have a list of people who will not buy anything via direct mail, and who will despise any company that they receive such mail from. Convince the sources of postal spam that removing those people from their list makes the rest of their list more valuable. Convince the downstream customers of those sources that using your list directly is far more convenient for them than dealing with opt-outs from every individual on it.

That also gives people a continued incentive to pay to remain on that list.

TrinaryWorksToo · on March 6, 2017

>the USPS has a detailed process for formally putting a company on notice for mailing someone who has specifically unsubscribed, and that process ends in massive fines for continued mailing to that person. I read a report of someone doing that to stop receiving persistent Dell catalogs.)

I would love to learn this process! I've repeatedly asked for a certain mailing to stop and it hasn't ceased.

edwhitesell · on March 6, 2017

I'd rather see much larger bulk rates via the USPS for something like "environmental impact".

For example, I'm currently getting no less than 3 letters per week from Spectrum (formerly TWC) promoting their new triple-play plans. I drop all of them in the recycling bin.

I couldn't care less about their bottom line, but I do care about the environmental impact.

At 3x per week, it probably costs them around USD$0.80/week (postage, paper, printing, etc.) to send those 3 letters. Call it USD$1 to make the math easier. I currently pay about $9.03/week. If I upgraded, it would be at least 3x that amount.

I'm not sure how to calculate the profit they would gain after an upgrade, but I can't imagine it would take more than 2-3 months for the new rates to more than cover the mailing fees for an entire year of their letters.

And yet, I'll never upgrade and I hate the waste caused by their practices.

jey · on March 7, 2017

Why does this happen? Is there some subcontractor who just bills by the piece so they don't care about de-duping the idiotic mailmerge that has 5 entries for each entity I'm affiliated with?

edwhitesell · on March 7, 2017

In the case I listed above, these are separate, distinct letters. Certainly from an automated process, but clearly all part of a larger marketing program.

In the ones like Dell (which I've also experienced in the past), I suspect there's some metric involved for getting the most contacts for "coverage". The reality is unless they were tracking my moves to different companies, there could be different people with the same name. In that sense, I'm very glad they can't correlate with employment data.

There's also the idea that sending to multiple people within a company means someone may see something they want and try to go through the procurement process because of the catalog, rather than because "IT" says it's time for a refresh.

Not that I agree with any of those practices, for a number of reasons, but I could understand the case for them.

jdeibele · on March 12, 2017

Cable and phone companies don't seem to care about efficiency in their marketing efforts. Almost everywhere they're allowed to count it as an expense and fold it into their regulated costs and make a regulated profit off it.

Putting on my cynic hat, buying lots of newspaper, television and radio ads is a good way to keep the media on the sidelines instead of criticizing the incumbents.

tjalfi · on March 7, 2017

You want a USPS form 1500 (https://about.usps.com/forms/ps1500.pdf). The intended use is for reporting obscene mail but the decision about whether something is objectionable is left up to the addressee (Rowan v. United States Post Office Department - https://www.law.cornell.edu/supremecourt/text/397/728).

JoshTriplett · on March 6, 2017

http://changelog.complete.org/archives/741-from-dell-a-uniqu...

https://web.archive.org/web/20090201192037/http://junkbuster...

bogomipz · on March 6, 2017

>"the USPS has a detailed process for formally putting a company on notice for mailing someone who has specifically unsubscribed, and that process ends in massive fines for continued mailing to that person."

I am doubting this is enforced in any meaningful way. I don't doubt there is a well-detailed process however the US Post Office's bread and butters seems to be Dell Catalogs and Southwest Airlines credit card offers. This fact is what killed Outbox:

http://www.insidesources.com/outbox-vs-usps-how-the-post-off...

compuguy · on March 6, 2017

Dell is pretty bad. I get every month three (when one is more than enough) catalogs to my address, because of slight variances in my name, even though they are all the same person.

notwhoyouthink · on March 6, 2017

We get four catalogs a month to three different people, one of which gets two catalogs; one addressed to their role as Vice President, and one as Boardmember.

We don't even want _one_ of them. It's not like we're going to see a new printer in a catalog and say "hey, sounds like a great buy!"

TheSpiceIsLife · on March 7, 2017

Maybe the solution is to sign up about 3000 people at every address you've ever lived at, if all of us sign up for hardcopy catalogues everywhere under every variant of our own names and fictitious names maybe they'll eventually stop or go out if business.

cmdrfred · on March 7, 2017

Why bother yourself or the new residents of your home? Sign up any Dell related outfits (resellers, etc) for them instead.

ssully · on March 6, 2017

People, like myself, pay a monthly fee for credit monitoring services. A lot of these services will identify if personal information of yours is publicly available on the web.

I only know of Instant Checkmate because my fiancé uses Mint Credit Monitoring and they notified her that her info was available on that site. I promptly helped her opt out of the site. I would have loved if Mint just had a 'Help me purge my info from this site' button, because I felt dirty just having to confirm my Fiancés info was on that site and then go through their process to remove it.

bogomipz · on March 6, 2017

>"I don't think too many people will pay a monthly fee to have their information removed from these services."

I would be interested in hearing what are you basing this on? Did you do some market research?

My feeling is that there is money in privacy, people seem to have no problem paying 5 or 10 dollars a month for a VPN provider for instance.

jameslk · on March 7, 2017

I can't remember the exact reasons since it was something I was researching a couple years ago. I saw a few competing services that offered annual plans in the $50-$100 range and I think I divided that by 12 and then imagined what would happen when people got most of the value out of the service on first use. It seems to be a very niche market and I'm not sure how much of a pain point it really is to that niche.

I had some other ideas to differentiate, which were a bit of a gray-area hack. This included pulling from the same data sources that Spokeo and similar services use to be able to search for that data. That way, the user wouldn't have to enter any personal information other than say an email address.

You could then offer a free service that let users enter their email address and they would find all the sites you've crawled where you found their information (similar to haveibeenpwned.com), with an upsell to automatically remove it.

Once a subscriber was signed up or previously had used the free service and hadn't opted out, I would keep crawling services for additional data and then give them a warning email when their data popped up again.

It seemed like a semi-reasonable business model at the time, but just a really small market, and I had other business ideas I wanted to pursue.

bogomipz · on March 7, 2017

Interesting points. I wonder if giving the customer visibility into events like "you data was successfully removed from X" would make a difference? Similar to peeking at your Spam folder now and again to see that it is working for you.

Also about the market size I imagine the market is just US? It seems other places, at leas Europe anyway has better laws to protect against these kinds of invasive services.

shshhdhs · on March 6, 2017

What about pay-per-record? If you identify 100 records? You could charge $50 (50 cents per record, as an example). Shows the value versus a blanket annual fee. The price-per-record could be reduced for the more records they have, or capped at $100 or something.

npezolano · on March 6, 2017

Why not structure the company as a public benefit corporation or a non-profit?

JoshTriplett · on March 6, 2017

> Why not structure the company as a public benefit corporation or a non-profit?

This seems like a good idea; I'd happily support such an endeavor.

EmielMols · on March 7, 2017

You could always "enhance" the business case by selling "whitelisting" to some of these data collection services. Just ask the guy from adblock plus.

samstave · on March 6, 2017

What about letting a user enter all their pertinent details, then having a "NUKE ME" button and it would eval how many sites and places to nuke from, and you base your costs to the user based on the effort to nuke the info you find. Then give a very straightforward list to the user and let them select which site/item they want to nuke and provide an upfront cost to them.

* You're found on sites 1, 2, 3

* Nuking site 1 is $10.00

* 2 is $1.50

* 3 is $22.55

---

Or something like that?

Dotnaught · on March 7, 2017

Offering a reward that rises in proportion to the number of sites with your personal information might just provide an incentive to create more such sites.

The more you'd pay for privacy, the more it would be worth to violate it.

samstave · on March 7, 2017

Game theory on your comment, sure... but I was just referring to "price per record/line item" -- how to separate these???

ccvannorman · on March 6, 2017

>I don't think too many people will pay a monthly fee to have their information removed from these services.

$5-10 a month right here, and I am not rich (yet). Time is valuable. My offer goes up with assurances/contingencies if my name is still on the lists/I am affected (e.g. LifeLock's $1M backing guarantee).

SomeCollegeBro · on March 6, 2017

Well, there could be a possible solution to your first problem: offer both a high priced one-time removal attempt, or a recurring subscription with a minimum contractual obligation (6 months for example).

gcb0 · on March 7, 2017

why would people pay for the opposite of what they daily work on at Facebook for free: sharing their private life.

the new generation grew up with spam and giving out private info like there's no tomorrow.

I may sound like a pessimist, but try to talk to any 14-16 year old and see for yourself.

Sunset · on March 6, 2017

What about a better business idea. Accept cryptocurrency to shut down these aggregating and re-selling services. Physically, by any means necessary. ANY.

mirimir · on March 7, 2017

Obligatory reference to https://cryptome.org/ap.htm

mikehollinger · on March 6, 2017

I actually am running a (year long) experiment. I found a data broker who had my info. They offered a way to "correct" the record, so I added a car that I don't own (who's warranty should be ready for extended warranty offers in a few months) and added many many "off by one" errors like an extra zero on my income, or transposed digits on the size of my house.

We'll see where it lands.

I personally would also pay for this sort of service.

dataflow · on March 6, 2017

> I found a data broker who had my info.

How...? Or do you just mean sites like InstanCheckmate themeselves?

> They offered a way to "correct" the record, so I added a car that I don't own

That's amazing! They didn't need proof? How did you convince them? Is this legal?

uiri · on March 7, 2017

Is this legal?

IANAL

That said, fraud is usually defined as "An intentional misrepresentation of material existing fact made by one person to another with knowledge of its falsity and for the purpose of inducing the other person to act, and upon which the other person relies with resulting injury or damage."

So, in order for such misinformation to be illegal, the data broker needs only to demonstrate injury or damage. Unless the data broker makes assurances to their customer about the truth of the information and gets sued by one of their customers for providing false information, I find it hard to believe that the data broker will be able to demonstrate injury or damages.

I think the data broker knows better than to make such claims about their data but who really knows. And it isn't like the downstream customers are going to independently verify all the information.

mirimir · on March 7, 2017

That makes no sense, unless there's a contract, or at least a business relationship.

literallycancer · on March 6, 2017

Why wouldn't it be? It's not like you are lying on tax forms.

otto_ortega · on March 7, 2017

> I found a data broker who had my info.

What's the name of it?

dataflow · on March 6, 2017

I've come across sites like these (SafeShepherd, though it doesn't claim to wipe your information from the "sources", only the sites they see your information on); the problem is I'm not sure I would trust them with something like my driver's license photo. Would you? Or what if they decide to sell your information behind your back later?

Edit: Maybe it was a different site that needed your license (I thought it was SafeShepherd but I can't find it anymore). I know I've definitely come across ones that do.

JoshTriplett · on March 6, 2017

I wouldn't trust them with a driver's license (why would they even need that?), but I'd trust them with name/address/phone/email information, because that's the information that seems far too readily available already.

dataflow · on March 6, 2017

I know they need it because a lot of the info-searching sites that they "opt" you "out" from require it. I feel like I read that some need SSN too but I'm not sure about that one.

Also note that SafeShepherd has this fine clause right at the end [1]:

"You agree that Safe Shepherd isn't liable for any failure to comply with these Terms."

What is this supposed to mean? Would you accept it?

[1] https://www.safeshepherd.com/tos

LeoPanthera · on March 6, 2017

Well that seems to imply that if you violate the TOS, it's your fault, not theirs. Seems reasonable?

dataflow · on March 6, 2017

It also implies if THEY violate their TOS, it's your fault, not theirs. Seems reasonable?

fanpuns · on March 6, 2017

There was a time (not sure if it's still the case) that the sites that post your info require proof of identity to remove your info

devicenull · on March 6, 2017

SafeShepherd seems like it's dying. They were active shortly after launch, but it's basically been radio silence ever since. I used to have an account with them, and I didn't find much value from them after the initial cleanup wave (they didn't seem to be keeping up with the new sites that came online).

jquast · on March 6, 2017

This service exists, I've used it, can vouch for it. https://www.abine.com/deleteme/landing.php

LeoPanthera · on March 6, 2017

This Reddit comment suggests DeleteMe doesn't work so well.

https://www.reddit.com/r/privacy/comments/3q0cfz/thinking_ab...

dbg31415 · on March 6, 2017

Can confirm... manually went through and opted out of everything I could find... made me feel shitty because I had to provide photos and ID card photos to do it... less than a year later it was all back again. Hired one of the services to do it the second time around... they got maybe 50% of it... and less than a year again it was all back (note I was still paying the monthly subscription to have it all purged). Worse... I had to tell them what sites I found that they missed and waste time dealing with support issues... and let's be honest neither the companies who put the shit up, or the companies who you pay to take it down, are all that honorable. Extortion, pure and simple.

Wish Google would just kill their site rankings, that would largely make the problem go away. Google is allowed to de-list spam sites... why they haven't classified all this crap as spam yet is beyond me.

JoshTriplett · on March 6, 2017

Quoting that comment:

> Your information will show up on those sites sometimes - it'll pop back up after being removed.

That suggests that they're not actually tracking down the sources, just poking the downstream sites that get data from those sources. Much less useful.

justin66 · on March 7, 2017

The sources include government agencies at the state, local, county, and federal level. Those agencies are not going to hide public documents with your name and address on them just because you ask them to.

There's a legal process to, for example, expunge a criminal record. On the other hand, most counties aren't going to seal the records associated with your house or other property you've bought just because you would like them to.

jszymborski · on March 6, 2017

I've been recently:

1) Segregating automated email to some @customdomain.com address (Yandex and Zoho host vanity domains for free)

2) Forward all email @ that domain to a common inbox

3) Sign-up with servicename@customdomain.com (e.g: facebook@customdomain.com)

You no-longer need to unsubscribe using dubious e-mail links, just automatically black-hole emails that come from spammer@customdomain.com

So, far, so good.

EDIT: List formatting

JoshTriplett · on March 7, 2017

Useful for email, but this is primarily about postal spam.

jszymborski · on March 7, 2017

somehow misread this entire post

o_____________o · on March 8, 2017

This also helps obscure the very widespread practice of selling data to third parties that identify you with a stable, cross-purpose ID. (Liveramp, et al)

vonklaus · on March 6, 2017

i think the best way to handle this is exactly the opposite. go on mechanical turk, and pay $500 to have them fill out every single free offer, product trial, social media account, ect with slightly similar but incorrect information. Give each turk 10 different pictures that look very similar to you, or are you in bad lighting, and pay them to upload them to G+, FB, ect.

It is much easier to make your correct info difficult to ascertain, than it is to remove it all.

JoshTriplett · on March 6, 2017

Then they'll just spam all of them. And if you have other reasons to not want to be tracked down, then having 20 or 30 addresses will not help much.

mirimir · on March 7, 2017

> junk mail

Back in the day, junk mail often had "Return Service Requested", and senders had to pay return postage. So we would tape address labels to cardboard-wrapped bricks, and mark them as "moved with no forwarding address". But that doesn't work anymore.

fanpuns · on March 6, 2017

Here you go: www.safeshepherd.com

I've been using them since way back when they were on earlibird. It's not perfect, but it has reduced the amount of times I show up in these sites significantly. The only disheartening thing is that most of these operators seem to just dump in batches of data periodically, ignoring any prior requests to remove so it has to be done again (which is fine, that's what you pay their automated search and remove request feature for)

ChuckMcM · on March 7, 2017

https://en.wikipedia.org/wiki/Safe_Shepherd

These guys don't do it for you? Basically it is there business plan.

ams6110 · on March 7, 2017

I have a foolproof plan for junk mail: I throw it away.

executive · on March 6, 2017

You mean data brokers (https://en.wikipedia.org/wiki/Information_broker)

Top US brokers:

- Acxiom

- Experian

- Epsilon

- CoreLogic

- Datalogix

- eBureau

- ID Analytics

- inome

- PeekYou

- Rapleaf

- Recorded Future

Protip: loyalty/reward cards are a gold mine, especially drug store purchase receipt data

bks · on March 6, 2017

Opt out links I found -

Acxiom - https://isapps.acxiom.com/optout/optout.aspx

Experian - http://www.experian.com/blogs/ask-experian/credit-education/...

DataLogix Holdings, Inc. https://www.datalogix.com/privacy/#opt-out-landing

Epsilon Data Management, LLC http://www.epsilon.com/consumer-preference-center

Equifax, Inc - https://help.equifax.com/app/answers/detail/a_id/2/noInterce...

Fair Isaac Corporation http://www.myfico.com/policy/privacypolicy.aspx

Intelius, Inc. https://www.intelius.com/optout.php

LexisNexis Group http://www.lexisnexis.com/privacy/for-consumers/opt-out-of- lexisnexis.aspx

TransUnion Corp. http://www.transunion.com/corporate/business/datareporting/s...

jibberia · on March 7, 2017

Thanks, this is very useful.

The paranoid voice in my head is wondering if these forms don't actually opt me out of anything, and instead just confirm to these companies that the information they have on me is correct.

_archon_ · on March 7, 2017

The LexisNexis link 404s (added a space), I believe this is correct:

https://www.lexisnexis.com/privacy/for-consumers/opt-out-of-...

Also, the following can't hurt:

https://www.lexisnexis.com/privacy/directmarketingopt-out.as...

dataflow · on March 7, 2017

+1 Thanks for sharing! If anyone's actually found these to work please share so we can get more confidence in them :)

blacksmith_tb · on March 6, 2017

The Epsilon link 404s, but I think this page is where it should have gone: https://www.epsilon.com/en_US/consumer-information/consumer-...

e15ctr0n · on March 6, 2017

New York Times reporter Natasha Singer has extensively covered the data broker industry for the past several years.

Here are some of the key articles but you can find more at https://www.nytimes.com/by/natasha-singer

Jun 16, 2012 | Acxiom, the Quiet Giant of Consumer Database Marketing http://www.nytimes.com/2012/06/17/technology/acxiom-the-quie...

Jul 21, 2012 | Consumer Data, but Not for Consumers http://www.nytimes.com/2012/07/22/business/acxiom-consumer-d...

Jul 24, 2012 | Congress Opens Inquiry Into Data Brokers http://www.nytimes.com/2012/07/25/technology/congress-opens-...

Dec 08, 2012 | Company Envisions 'Vaults' for Personal Data http://www.nytimes.com/2012/12/09/business/company-envisions...

Aug 31, 2013 | A Data Broker Offers a Peek Behind the Curtain http://www.nytimes.com/2013/09/01/business/a-data-broker-off...

Sep 04, 2013 | Getting a Glimpse of Your Own Marketing Data Online http://bits.blogs.nytimes.com/2013/09/04/getting-a-glimpse-o...

Sep 04, 2013 | Acxiom Lets Consumers See Data It Collects http://www.nytimes.com/2013/09/05/technology/acxiom-lets-con...

Dec 23, 2014 | Data Broker Is Charged With Selling Consumers' Financial Details to Fraudsters https://bits.blogs.nytimes.com/2014/12/23/data-broker-is-cha...

Jun 28, 2015 | When a Company Is Put Up for Sale, in Many Cases, Your Personal Data Is, Too http://www.nytimes.com/2015/06/29/technology/when-a-company-...

Pro tip: If you want to avoid giving out your purchase data, pay cash wherever possible.

AndrewKemendo · on March 6, 2017

Right, but where do they get their data?

Are they rolling their own JS API that developers roll into each page? I certainly have never put any of that into any site I've made or seen.

executive · on March 6, 2017

They buy it direct from credit card companies, retail stores, 'free' online widgets (AddThis biz model: https://www.quora.com/Whats-the-business-model-for-AddThis-a...), etc

AndrewKemendo · on March 6, 2017

Right, that all makes sense but seems like it would be more of a grind than a simple JS API. You are effectively creating a marketplace.

executive · on March 6, 2017

Correct - see http://www.crosspixel.net for example:

"Cross Pixel's DMP is powered by our proprietary data relationships with more than 5,500 web sites and mobile apps where we identify and harvest the shopping and researching behaviors on over 650 million unique browsers. Our data partners are leading e-Commerce sites, search directories, comparison shopping engines, coupon sites and toolbars across North America and Latin America."

In general, the 'marketplace' is usually the DMP (Data Management Platform) where two parties can meet and share segments without data leakage (for example - Krux is a DMP used by a lot of Fortune 500 companies).

However the lines between DMP and Data Provider are blurring in recent years...

AndrewKemendo · on March 6, 2017

Great answer thanks! I hadn't heard the term DMP

x0x0 · on March 6, 2017

BlueKai is one of the biggest; it's a data marketplace for cookie-tagged data. They were bought by Oracle a few years ago.

GFischer · on March 6, 2017

And Krux's website says they have been acquired by Salesforce.

http://www.krux.com/blog/general/salesforce-krux/

xerxes777 · on March 6, 2017

Big thanks for the info, that was insightful.

dataflow · on March 6, 2017

> Right, but where do they get their data?

Yeah, I'm basically looking for the companies whose answer to this question is "by actually mining the data ourselves from your doctor, grocery store, Facebook, etc.".

dsp1234 · on March 6, 2017

Why do you think it works this way and not the other way around as well? Grocery stores shop their data around to see who will pay the most for it. A person who is out of work goes to the local courthouse and requests a bunch of records, compiles them into a spreadsheet and then cold (or warm with something like LinkedIn) calls to see if anyone is interested in the data. An online quiz company is going out of business, and as part of their bankruptcy settlement, they sell off their database of answers at auction. etc, etc, etc.

As pointed out elsewhere, it's a marketplace, and as such there are going to be buyers and sellers. Some of those sellers are going to be primary sources themselves.

dataflow · on March 6, 2017

Good question. I thought it works that way because it takes a lot of work to sanitize and cross-link people's data to other datasets accurately, so even if it's a "push" model, I still can't believe that every single website that does this does their own data cleaning & ML & whatnot. It's far too much repeated work and a good business to just do the work and sell it off to others. So I'd assume a few companies have to be making profits at the lower layer regardless of whether it's a pull or a push model.

dsp1234 · on March 6, 2017

As mentioned elsewhere, you seem to have a lot of unfounded assumptions, and misconceptions about this sector.

It's far too much repeated work

Companies will repeat work over and over again if it's cheaper than buying it, they have custom needs that aren't filled with the data available, etc. Businesses repeat work all the time, and this is not any different. Additionally, for many businesses in the sector, they themselves are the primary source for data. For them it's not repeated work.

a good business to just do the work and sell it off to others.

Yes, that's why some aggregators exist. They make money by brokering the data from multiple sources, some primary and some resold. But they are the tip of the iceberg.

You seem to be under the impression that there is some small list of companies who are all working from primary sources, and that everyone then gets feeds of data from those companies. This would make sense if gathering data was very difficult, or had a natural resource-like limitations. So that model works well for something like diamond mining (as compared to diamond growing), because the number of diamond mines are limited, and there is a natural entry barrier. However, that doesn't take into account the fact that gathering this data is generally easy. Sometimes it's very easy, such as a sftp feed of data from a government records database. Sometimes it's a bit harder, such as needing to physically be present to obtain the data.

That means there is very little barrier to entry, and thus generally there is going to be a lot of competition, and thus many companies vying to make money.

Personal data has value just like any other commodity. So a bit of economic theory goes a long way to understanding what the boundaries of a market might be. Low production cost, high profit goods generally have a large number of companies in the market.

VT_Drew · on March 6, 2017

> Right, but where do they get their data?

You give it to them when you sign up for that stupid rewards card.

clubm8 · on March 7, 2017

When I signed up for a major grocery store rewards card, I was able to put a bogus name on the form. How can they correlate my purchases with me specifically? Can they correlate the charges to my credit card with the times the rewards card was used?

(This grocery store did not have a pharmancy)

dredmorbius · on March 7, 2017

Credit card or check payments will link records.

Cellphone or MAC tracking.

In the near future if not already, facial recognition.

dataflow · on March 6, 2017

+1 Thanks for actually posting a list!

goshx · on March 6, 2017

It pisses me off too. US is so concerned about privacy, yet a LOT of your private information is made public once you start opening bank accounts, buying real estate, sign up for gym, etc.

When I opened my first bank account they had a typo in my name, which I found out when I received my debit card. I asked them to fix it immediately, however, two to three weeks later I was already getting mail from stores addressed to the misspelled name.

When I was buying my first house I immediately started receiving mail from moving companies at my old address before I signed the closing. After I moved I got a lot of junk mail with other kinds of offers. I even started getting PHONE CALLS from a home monitoring/alarm company. When I asked them where they got my number they hang up.

It is like all the information is up for sale somewhere.

confounded · on March 7, 2017

Taking power away from capital to give to individuals is currently positioned as un-patriotic in the US (e.g. pandabear187 above's beleif that regulation of data brokers will lead to fascism or communism, even though he goes to great lengths to protect his own information).

diminoten · on March 6, 2017

> US is so concerned about privacy

This is not true, at all. HN may provide that appearance, but the vast majority of people who live in the US do not care about their privacy, based on their actions.

VLM · on March 6, 2017

"when a court needs to order that someone's information be purged (for whatever reason, e.g. for safety)"

I believe that is the location of your confusion, that is a Hollywood fiction, mostly. If a collections agency is bugging you there is a way to resolve it via the legal system, but its very much case by case and company by company business. A judge can order one company who's officer or agent is present in the courtroom to do something to one record. A judge can purge his own legal system's record of an arrest if he wants to. Belief in this in general is analogous to non-computer people believing in the CSI tv show or hollywood hacking

dataflow · on March 6, 2017

What about the new higher level companies that pop up amin to Instant Checkmate? How do they know which lower-level companies to buy your information from? There's no way they ALL do the heavy lifting themselves. Someone's gotta be making money off doing the real work and others must be buying from them.

e0m · on March 6, 2017

If you have your own domain name with a wildcard, it's really helpful to enter: someservice@mydomain.com as your email. That way if it leaks you'll know who did it and can setup much more robust rules to block. I'll use the domain name as the main address so I remember which name goes to which site.

For physical address mailings, you can hyphenate (or use a middle name) as the service. So First Service-Last as the addressee name. While harder to setup "mail rules" for, at least you'll know who to never trust again.

Sephr · on March 6, 2017

I've been doing exactly this for a while. Here is my list of companies that have leaked the email address I gave them to spammers: https://gist.github.com/eligrey/5084991

lucb1e · on March 7, 2017

Awesome work!

I do the same with email addresses, but receive very little spam. Mostly I block addresses that start spamming me with newsletters. I've thought about keeping a list, but most companies actually stick to the Dutch anti-spam laws (which are quite good).

Only Dropbox and one personal contact ever actually sold/leaked my email address, and Paypal of course but they hand my email address out to all merchants so they're almost certainly not to blame themselves (not beyond the fact that they hand it out in the first place).

dataflow · on March 6, 2017

Note that (I think) Adobe was hacked, so that doesn't mean your email was "leaked" by them per se, not in the sense we mean anyway.

Also, out of curiosity, how long did it take these companies to leak your info, generally? Days, weeks, months, years...?

Sunset · on March 6, 2017

Dropbox was hacked as well.

Faark · on March 6, 2017

I do the same, using mailgun to forward them to my usual account. Thou the day i will actually have to send / reply to e.g. customer support with one of those mail addresses will be annoying. Any suggestions on that part?

literallycancer · on March 7, 2017

Google apps can kind of do it[1] (although it appears to leak the main address on purpose), so I guess you can also do it with your own mailserver?

1 - https://support.google.com/mail/answer/22370?hl=en

avh02 · on March 6, 2017

I just started doing the wildcard domain thing last month, I'm happier knowing that I can shut the taps. I get annoyed just knowing that there's spam in my spam list.

First time I was on the phone with a customer service rep. after using the <website>@<domain>.com format I was asked if i was sure my email address was correct. I lol'ed and told them not to worry about it.

megous · on March 7, 2017

I use random 20 char string for the local part. That way there's no question about the leak. Spammers use a lot of dictionary words in the local part of the email address, so it's better to have a random string. If you're using password manager anyway, there's no reason not to make email/username random too.

For smaller e-shops you might find some with actively exploited 0days this way. I did.

Sephr · on March 7, 2017

That is useful, but how do you manage to remember the mappings? I want to know what random string corresponds to what service without having to search my password manager.

The best of both worlds would be random local part + a Chrome extension that manages the mappings. The Chrome extension can then replace the local part in Google Inbox with the corresponding site name.

Sephr · on March 7, 2017

Update: Just use service name + random string concatenated for the local part. Seems like the best solution.

avh02 · on March 12, 2017

not a bad tip, i'll do this for less reputable websites i suppose.

danielk_ · on March 6, 2017

Even easier is to use john+someservice@doe.com. Works with any e-mail provider, ends up in your normal mailbox. Gmail even adds tags based on what follows the plus.

Might not work for some services due to ignorance of the spec or to prevent users doing this.

dataflow · on March 6, 2017

Doesn't really work since anyone with half a brain would remove the plus sign and after, knowing the email is more useful without that part. I've never caught anyone this way.

avh02 · on March 7, 2017

problem with that is websites who think they're super smart and believe that + is not a valid character in an email... sometimes it's just the javascript though and you can submit it via manual POST request.

rootsudo · on March 7, 2017

You can do this with gmail.

For example if your gmail is root@gmail.com

You can do root+yahoo@gmail.com, root+reddit@gmail.com and such on.

clubm8 · on March 7, 2017

It's relatively easy to use regular expressions to strip that out though. I set up a catch all for my domain, and give each entity it's own unique addy to guard against this.

criddell · on March 6, 2017

Last year I was getting constant calls and snail mail about buying an extended car warranty on a car that I no longer own. I asked the place where I bought my car if they sell that information and they claimed not to.

So where do these sleazy companies get that data? The DMV?

This year, I'm getting two or three calls every week about a buying a home security system and monitoring.

I don't understand why these calls aren't easier to block. Somebody knows here they are originating from. Why can't I get that information too?

pc86 · on March 6, 2017

Vehicle registration is most likely public record in your state.[0] You can't push a button and stop all of these calls because there are dozens (probably tens of dozens) of companies that search and process public records and other data sets then sell that information to various companies.

[0] http://www2.westlaw.com/CustomerSupport/Knowledgebase/Techni...

szc · on March 7, 2017

The "home security system" calls could be social engineering to find out if your home is protected or not. Just by listening to what they say and not saying you already have one is enough to tell the caller what they want to know.

criddell · on March 7, 2017

I would never tell them anything, but I have listened to the recording and pressed '1' to speak to a representative and when I do that, nobody ever picks up. It's baffling.

showkiller · on March 6, 2017

I believe and based on the links below the data is sold by different entities to companies like Intelius etc.

https://www.scientificamerican.com/article/how-data-brokers-...

http://triblive.com/news/allegheny/8690215-74/drivers-inform...

http://spectrum.ieee.org/riskfactor/computing/it/us-states-s...

etree · on March 7, 2017

The one that surprised me is to learn that virtually all health insurance companies sell your personal health information. Most people think this is illegal because the data is sensitive. But it turns out that if it's generated by a business transaction (i.e. a claim between your doctor and your insurance company) then it's not considered PHI and it's not protected.

chrisgoman · on March 6, 2017

For pre-employment screening, we had court runners literally sitting through the courthouses going through paper records for each candidate on an "ad-hoc" basis. Some companies do this in a more organized fashion by having a person just data enter ALL the records (like in North Carolina if I recall) and since they had this data, we just bought the company.

dataflow · on March 6, 2017

Pre employment screening is different though. They need your permission for a legal background check and of course they will do everything necessary to do it. I'm asking about the information that leaks without your permission.

NickBusey · on March 6, 2017

Small tip: At the grocery store or anywhere else with a rewards account linked to a phone number, rather than signing up for one just use (Your Local Area Code)-867-5309

The number almost always exists and is a valid account. Get the discount, don't get tracked. Thanks, Tommy Tutone.

gvb · on March 6, 2017

I've given (area code) 555-1212 and don't recall it ever being questioned. Technically I've given them my phone number, but with one indirection. ;-)

https://en.wikipedia.org/wiki/Directory_assistance

Turing_Machine · on March 6, 2017

If I can't conveniently avoid those things, I like to use the names of famous serial killers, with a local address that would be in the ocean (if it existed).

I've yet to have a sales clerk question it (or perhaps they just don't care).

ams6110 · on March 7, 2017

The clerks don't care. Why would they?

blacksmith_tb · on March 6, 2017

This has the added advantage of producing some very strange buying-habit data, as you are likely not the only person to have provided 867-5309 / 555-1212.

dredmorbius · on March 7, 2017

301-688-6524

Since they're listening anyway. Cut out the middleman.

compuguy · on March 6, 2017

The last time I tried that, it didn't work (or there wasn't an account associated with it).

VT_Drew · on March 6, 2017

Make sure to tell them that you name in Jenny too.

r00fus · on March 6, 2017

Thanks - I always used my friends' parents old number - so someone I didn't know - this would be even better.

AdmiralAsshat · on March 6, 2017

Have you ever gone to the doctor, signed up for a gym, or signed up for your local grocery store's membership rewards program?

That's how they get your information.