Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> from outside, there’s no way to prove that a particular ad was served to you because of your data instead of another factor they use as input for the algorithm

Create an account, use it to excessively look for tea pots until you predominantly get ads for tea pots. Store the cookie[0] that encodes your user id. Delete the account.

After the advertised account retention period, try the cookie again in an otherwise squeaky clean session that gives random ads without the cookie:

Tea pots? Something survived.

No tea pots? No targetting.

(and yes, the onus to demonstrate your accusation well enough to convince a DPA is on you, IMHO as it should be)

[0] or whatever bits of data are necessary: try to recreate the tea-pot session after a logout without logging in officially to determine what's necessary.



I don't believe for a second they keep the data at a company of Google's size. But they don't need to anymore if they just feed your data through some neural network training that outputs a black box filled with weights. They can legally show your data is gone but now their model knows that people with similar browsing history, location, etc have a weak signal for tea cups. You'll get the occasional ad for them if their other networks don't filter out all the history for appearing suspicious. Google can bring in a bunch of experts saying how much data they've processed and no one really knows how the ML algorithm learns, etc.

They've improved their model none the less even if you delete your data after the fact.


And that is fine by me. Using my behavioral data to train an underfitting model is very different from actually storing my behavioral data. Sure, the word 'underfitting' does the heavy lifting in my previous sentence. But I don't think that overfitting is even feasible at this scale. Google does not train models that just memorize the habits of 3 billion people. Such a model would be useless.


Unless your behavioural data is enough of an outlier to identify you.


Yes but an outlier will be identified regardless of where s/he chooses to go, the data just reflects that idea


If you're coming from the same device or IP address/range, that's enough of a link.

An anecdote: I stayed with relatives for a few days one time, and my two-year-old niece really liked one particular music video on YouTube, so it got played on their TV box 3-4 times during my visit. Not on my laptop, but I used said laptop on their wifi (not logged into Google products, as I never am). When I got home, YouTube highly recommended that video to me after I watched an unrelated video, when there was no other reason to recommend it (I didn't know the artist, almost never watch music videos, and it wasn't anything all that popular).


I notice a similar behaviour. I don’t have a Google account and don’t keep cookies, so the only way for them to track me is my IP.

My IP seems to be permanently associated with my YouTube habits. They’re even being sneaky about it, as in they’ll give you a default homepage and generic recommendations first, but watching any video similar to the previous viewing habits will bring back not just videos related to that one, but the entire history they’ve collected over the years (some of the topics I watch are completely separate and would never intersect normally, so the only way for them to both come up in suggestions is from previous viewing history).

I didn’t create an account, didn’t agree to any privacy policies, and am blocking any and all cookies just like they advise in their own privacy policy and yet I am still being tracked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: