This is yet another example of why sending the Referrer header is insane. It's a massive privacy breach by design. Anything serious about protecting privacy needs to stop intentionally betraying the user's browsing path and simply remove the header from all HTTP requests.
Anything that functionally relies on a valid referrer is at best an unfortunate but necessary casualty. However, I suspect that far too often this is simply a way to obfuscate the usual tracking. If you tie your functionality to something designed to violate user privacy, don't be surprised if that functionality breaks when the privacy leak is finally fixed.
If you're using Firefox, the Smart Referer add-on strips out the HTTP Referer and the value of document.referer (in JavaScript) from cross-domain requests. It includes a default whitelist, and is customizable.
This extension is most effective if you also use an ad blocker (like uBlock Origin) and Firefox's first-party isolation feature, although Smart Referer will still help prevent tracking even if you don't.
You can also set this directly in about:config under network.http.sendRefererHeader:
0 = never send the header
1 = send the header only when clicking on links and similar elements
2 = (default) send on all requests (e.g. images, links, etc.)
If you want more granular control (like sending referrers but only the root of the domain) all of the various network.http.referer flags for Firefox are listed here:
I fail to see why sending the referer is a privacy concern. Following that logic, every datapoint is a privacy concern. From screen resolution to mouse movement, everything can be abused to build profiles. Referer headers have a host of valid usecases but if you are opposed any data being shared you'll probably dismiss all of them.
> Following that logic, every datapoint is a privacy concern
I think it's a reasonable login when we're talking about data points sent implicitly. In my ideal world the website just gets the resource identifier, and browser cares for the rest.
> I fail to see why sending the referer is a privacy concern.
From the article:
>> The Zeus platform monitors contextual data such as ... what URL they have used to arrive there ... The publisher will then match that data to its existing audience data pools ... to create assumptions on what that news user’s consumption intent will be. The technology uses machine learning to decipher the patterns.
They are explicitly stating they use Referrer-like data to track users.
> They are explicitly stating they use Referrer-like data to track users.
To me, "track user" means persistently ID one person. This sounds more like inferring anonymous interest, like inferring that someone arriving from ESPN.com might be interested in your sports section.
If that person comes back tomorrow from The Financial Times, you might infer that they are interested in the economy.
But without cookies, I don't see how you would recognize that visit as the same person as yesterday and integrate the sports and economy interests into a persistent profile. Each visit would be self-contained, which doesn't fit my definition of "tracking."
> But without cookies, I don't see how you would recognize that visit as the same person as yesterday and integrate the sports and economy interests into a persistent profile.
If you gather enough of that "anonymous" data, and particularly if you combine it with other data sets (as they claim they are intending to do), then it's not that hard to recognize individuals based on their usage patterns and metadata.
Going with that premise for the sake of argument: Nah, probably not.
But that premise is well-known to be in conflict with established reality. Identifying specific individuals from these sorts of data points is famously, disturbingly easy to do. People even do it just for fun, almost like it were an Advent of Code challenge. That's the reason why there will never be another Netflix Prize.
So that probably makes me part of a group composed of several hundreds of other visitors who exhibited the same behavior. I fail to see how that violates my privacy. You'd probably learn a lot more about me by watching me walk from my office to the parking lot.
>You'd probably learn a lot more about me by watching me walk from my office to the parking lot.
well I don't really expect the newspaper that I read to watch me walk from my place of work to the parking lot. In fact I don't want them to watch me at all because I don't expect newspapers to be in the surveillance business.
When I buy a newspaper at a store the guy behind the counter doesn't follow me three blocks to figure out how I drink my coffee at the coffeeshop, yet curiously enough this is how the internet works, everyhwere
> When I buy a newspaper at a store the guy behind the counter doesn't follow me three blocks to figure out how I drink my coffee at the coffeeshop, yet curiously enough this is how the internet works, everyhwere
But if you keep returning to the same news stand, he'll probably reach for the newspaper you like when he sees you coming. This is the equivalent of what the Washington Post does now. Not following you to the coffeeshop like the ad tech of today.
I don't object to a business or individual I interact with to getting to know my preferences better, that's inevitable and a good thing. What he doesn't do however is commodify my personal information and sell it to third parties and advertisement agencies so that they in turn can try to manipulate me and show me stuff I don't want, and I also suspect no newspaper vendor runs a high tech operation in the basement that, without my explicit knowledge runs some sort of panopticon like experiment on my personal data.
Do you know what I'd really like to see? A sort of frame in frame of what the algorithm sees that tracks me while reading a Wapo article, directly shown to the reader. It'd be interesting to see how people would react if they were aware of how exactly they're being followed around and analysed.
>What he doesn't do however is commodify my personal information and sell it to third parties and advertisement agencies so that they in turn can try to manipulate me and show me stuff I don't want, and I also suspect no newspaper vendor runs a high tech operation in the basement that, without my explicit knowledge runs some sort of panopticon like experiment on my personal data.
Where in the article do you see that WaPo does this? I was under the impression that this is WaPo-only data, collected by WaPo and used by WaPo. Too sell advertising space, yes. But that's because you're not paying them directly.
>Do you know what I'd really like to see? A sort of frame in frame of what the algorithm sees that tracks me while reading a Wapo article, directly shown to the reader. It'd be interesting to see how people would react if they were aware of how exactly they're being followed around and analysed.
I would love that too, but as long as it doesn't explicitly mention them by name I guess people don't care. Look at Facebook, here people never had any problem sharing really private information in exchange for free information and entertainment.
>"Anonymity in real life is much different than anonymity in the lab, and most people are content to be “one in a million” even if they cannot be “one in 6.7 billion.” In any data set, highly unique individuals (i.e. the outliers) may stand out, much like today’s celebrities do not enjoy the same level of anonymity as the average citizen. However, the fact that some individuals may be identified in a particular data set does not mean that any (or all) individuals may be identified in the data set."
If my site is getting hammered by visitors I would like to be able to easily discern if it's because I'm featured on HN's frontpage or if I'm victim of a DDOS attack.
A similar line of argumentation has been historically used to push every outrageous thing on innocent people since forever. You sell the "abuse" as defense for a shocking crime. Ok, you only said DDoS when the usual is terrorism and child abuse. But the bottom line is the same: I need to take something private from you to defend myself.
What would you think if all stores took every measurement they could about you without disclosing it and eventually justified it by saying "how else would I know you're not a thief"?
A referrer header is not an outrageous amount of information. It's the store-equivalent of asking "Where did you learn about us?" Taking it away would hurt smaller sites and do nothing against large companies and ad networks.
The store is asking, the site is not. And 99% of people are trained to click "Accept" after years of dark pattern abuse and they have very little understanding of what happens in the background. I hope you understand that my point isn't to bash a webmaster but rather bring in discussion the principle of the whole thing. Seems that everybody draws the line for what is acceptable in such a way that it perfectly covers their own needs.
I've seen people that insist that using facial recognition is not different from what humans are doing naturally, now done also with electronics. We can agree the implications are different.
You sell the "abuse" as defense for a shocking crime.
This works the other way around too. You use the abuse of non-personally identifiable information (by combining it with other data points, illegal without consent in the EU) to take useful data away from innocent webmasters.
> to take useful data away from innocent webmasters.
Webmasters who are collecting data about me or my machines (excluding the data about my direct use of their site) without my permission are not "innocent webmasters".
I'm surprised that in 2019 people (especially on HN) still believe/claim that users trying to hang on to their personal data "abuse" this to "take useful data away from innocent webmasters".
There are dozens of real life situations where covertly collecting such data would be considered completely unacceptable and yet my comment arguing this was still substantially downvoted.
But I guess my point is being in a technically literate community makes no difference when it comes to making a buck. Once one agrees to take a "not an outrageous amount" of private data for a bit of money, they'll agree to take an outrageous amount for outrageous money. And I think this is a perfectly accurate explanation for what FB, Google, [you name it] are doing.
Doesn't your argument work against encryption just the same? With such an argument aren't you actually punishing 99.9% of the internet population for what the 0.1% is doing?
But in general it's the only way to understand who's linking to you. Sure, not essential, but useful to see in general, especially when search engines could send it and you could see what keywords people used to find your site. If it were gone, as it is in many cases now due to https, people will adjust.
That's not exactly true. Referrer is only hidden if it's explicitly asked by using a meta tag:
<meta name="referrer" content="no-referrer" />
Or by using Referrer-Policy:
Referrer-Policy: no-referrer
The default behavior is no-referrer-when-downgrade. This means that referrers from https to http are hidden. But https > https is still visible. And with https adaption reaching saturation, referer headers are usually still sent.
Cross-origin sending of the Referer header can be disabled in Firefox with network.http.referer.XOriginPolicy, along with a variety of other Referer-related options [1]. I have it set to 1 (and XOriginTrimmingPolicy to 2) and haven’t experienced (m)any issues.
There are very goods reasons for the Referrer header to be used. If you see a lot of traffic going to a URL with a typo, you will want to know where that typo is. If someone hotlinks to a large file in your domain, you will want to know who it is and block it. Any alternative would be much more intrusive.
> you will want to know where that typo is ... you will want to know who it
I know you want to know those things. Find another way to handle those issues.
To be a "good reason", you need to show why your reason is worth paying the high price of betraying every user's browsing path to every server. Worrying about hotlinks and typos... "ain't the same fuckin' ballpark, it ain't the same league, it ain't even the same fuckin' sport".
> Any alternative would be much more intrusive.
Did you consider only serving that "large file" only when accompanied with a proper session cookie created when they loaded the HTML file? There are many solutions to those problems, including some that are sever-side-only.
I understand your concerns very well, but I have a different perspective. I don't modify my Referrer header. I want to let the websites I'm using where I came from. A referrer by itself is innocuous - only when you combine it with other nefarious techniques it wreaks havoc on users privacy. But on it's own, in an anonymous browser environment that I tend to use, it's actually quite useful.
It's not really "design"--the header name is even misspelled. This one always felt like, at the time in the early days of the web, it'd be interesting data to pass along. Since then, things like image hotlinking started to depend on it, and Google got better about hiding referrer data, so there wasn't the same motivation to fix it as implementing same-origin policy. If the web were invented today, yes, I doubt that this would be a thing.
It was how you did sessions before Cookies and JavaScript existed, and existed because it was a problem that needed solving. Converting forms to wizards and the first Internet shopping carts.
I agree. There are some add-ons that spoof/disable this header for you, but as you said, this breaks some sites. I agree, as a consumer, that website that rely on the header are out of luck with regards to my business, but at work I don't always have a choice with regards to which online tools we use. But white listing the things that break is a fine solution in that case.
I forge the referer as the root of the site, except in the case of news sites that allow referers from google news to bypass the paywall, in which case I always forge that. This very rarely breaks anything (one out of a million sites expect an external or specific referer.)
and I wanted to recall that google employees repeatedly removed chromium's project code to restrict or disable referrer headers.
I personally was involved in 3 distinct times. And after that gave up chromium and the lie of google-independence completely.
and so should you. If they tweak things to reach their profit goals, they will also do the same when any agency "asks" them to. it's a slippery slope, and they already crossed
This is yet another example of why sending the Referrer header is insane. It's a massive privacy breach by design. Anything serious about protecting privacy needs to stop intentionally betraying the user's browsing path and simply remove the header from all HTTP requests.
Anything that functionally relies on a valid referrer is at best an unfortunate but necessary casualty. However, I suspect that far too often this is simply a way to obfuscate the usual tracking. If you tie your functionality to something designed to violate user privacy, don't be surprised if that functionality breaks when the privacy leak is finally fixed.