Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> what URL they have used to arrive there

This is yet another example of why sending the Referrer header is insane. It's a massive privacy breach by design. Anything serious about protecting privacy needs to stop intentionally betraying the user's browsing path and simply remove the header from all HTTP requests.

Anything that functionally relies on a valid referrer is at best an unfortunate but necessary casualty. However, I suspect that far too often this is simply a way to obfuscate the usual tracking. If you tie your functionality to something designed to violate user privacy, don't be surprised if that functionality breaks when the privacy leak is finally fixed.



If you're using Firefox, the Smart Referer add-on strips out the HTTP Referer and the value of document.referer (in JavaScript) from cross-domain requests. It includes a default whitelist, and is customizable.

https://addons.mozilla.org/en-US/firefox/addon/smart-referer

It's also open source.

https://gitlab.com/smart-referer/smart-referer

This extension is most effective if you also use an ad blocker (like uBlock Origin) and Firefox's first-party isolation feature, although Smart Referer will still help prevent tracking even if you don't.

https://www.ghacks.net/2017/11/22/how-to-enable-first-party-...


You can also set this directly in about:config under network.http.sendRefererHeader:

  0 = never send the header
  1 = send the header only when clicking on links and similar elements
  2 = (default) send on all requests (e.g. images, links, etc.)
If you want more granular control (like sending referrers but only the root of the domain) all of the various network.http.referer flags for Firefox are listed here:

https://wiki.mozilla.org/Security/Referrer

Doesn't have a few of the features that your extension has, but it's done the trick for me!


I'm not the developer of the extension (just a user), but thanks for the about:config tip!


Has the 0 setting broken anything in your experience?


"something designed to violate user privacy"

I fail to see why sending the referer is a privacy concern. Following that logic, every datapoint is a privacy concern. From screen resolution to mouse movement, everything can be abused to build profiles. Referer headers have a host of valid usecases but if you are opposed any data being shared you'll probably dismiss all of them.


> Following that logic, every datapoint is a privacy concern.

Yes, every data point is a privacy concern.


Even the DNT header is a privacy concern.


> Following that logic, every datapoint is a privacy concern

I think it's a reasonable login when we're talking about data points sent implicitly. In my ideal world the website just gets the resource identifier, and browser cares for the rest.


> I fail to see why sending the referer is a privacy concern.

From the article:

>> The Zeus platform monitors contextual data such as ... what URL they have used to arrive there ... The publisher will then match that data to its existing audience data pools ... to create assumptions on what that news user’s consumption intent will be. The technology uses machine learning to decipher the patterns.

They are explicitly stating they use Referrer-like data to track users.


> They are explicitly stating they use Referrer-like data to track users.

To me, "track user" means persistently ID one person. This sounds more like inferring anonymous interest, like inferring that someone arriving from ESPN.com might be interested in your sports section.

If that person comes back tomorrow from The Financial Times, you might infer that they are interested in the economy.

But without cookies, I don't see how you would recognize that visit as the same person as yesterday and integrate the sports and economy interests into a persistent profile. Each visit would be self-contained, which doesn't fit my definition of "tracking."


They didn't say no cookies. They said no third-party cookies.


> But without cookies, I don't see how you would recognize that visit as the same person as yesterday and integrate the sports and economy interests into a persistent profile.

If you gather enough of that "anonymous" data, and particularly if you combine it with other data sets (as they claim they are intending to do), then it's not that hard to recognize individuals based on their usage patterns and metadata.


But does it matter if you don’t know who they are?


Going with that premise for the sake of argument: Nah, probably not.

But that premise is well-known to be in conflict with established reality. Identifying specific individuals from these sorts of data points is famously, disturbingly easy to do. People even do it just for fun, almost like it were an Advent of Code challenge. That's the reason why there will never be another Netflix Prize.


It does to me.


Why?


So that probably makes me part of a group composed of several hundreds of other visitors who exhibited the same behavior. I fail to see how that violates my privacy. You'd probably learn a lot more about me by watching me walk from my office to the parking lot.


>You'd probably learn a lot more about me by watching me walk from my office to the parking lot.

well I don't really expect the newspaper that I read to watch me walk from my place of work to the parking lot. In fact I don't want them to watch me at all because I don't expect newspapers to be in the surveillance business.

When I buy a newspaper at a store the guy behind the counter doesn't follow me three blocks to figure out how I drink my coffee at the coffeeshop, yet curiously enough this is how the internet works, everyhwere


> When I buy a newspaper at a store the guy behind the counter doesn't follow me three blocks to figure out how I drink my coffee at the coffeeshop, yet curiously enough this is how the internet works, everyhwere

But if you keep returning to the same news stand, he'll probably reach for the newspaper you like when he sees you coming. This is the equivalent of what the Washington Post does now. Not following you to the coffeeshop like the ad tech of today.


I don't object to a business or individual I interact with to getting to know my preferences better, that's inevitable and a good thing. What he doesn't do however is commodify my personal information and sell it to third parties and advertisement agencies so that they in turn can try to manipulate me and show me stuff I don't want, and I also suspect no newspaper vendor runs a high tech operation in the basement that, without my explicit knowledge runs some sort of panopticon like experiment on my personal data.

Do you know what I'd really like to see? A sort of frame in frame of what the algorithm sees that tracks me while reading a Wapo article, directly shown to the reader. It'd be interesting to see how people would react if they were aware of how exactly they're being followed around and analysed.


>What he doesn't do however is commodify my personal information and sell it to third parties and advertisement agencies so that they in turn can try to manipulate me and show me stuff I don't want, and I also suspect no newspaper vendor runs a high tech operation in the basement that, without my explicit knowledge runs some sort of panopticon like experiment on my personal data.

Where in the article do you see that WaPo does this? I was under the impression that this is WaPo-only data, collected by WaPo and used by WaPo. Too sell advertising space, yes. But that's because you're not paying them directly.

>Do you know what I'd really like to see? A sort of frame in frame of what the algorithm sees that tracks me while reading a Wapo article, directly shown to the reader. It'd be interesting to see how people would react if they were aware of how exactly they're being followed around and analysed.

I would love that too, but as long as it doesn't explicitly mention them by name I guess people don't care. Look at Facebook, here people never had any problem sharing really private information in exchange for free information and entertainment.


But that's because you're not paying them directly.

No ads or tracking are disabled for subscribers. Paying them directly makes no difference.


That's utterly stupid.


> as long as it doesn't explicitly mention them by name I guess people don't care.

I should not be subjected to spying just because most of my neighbors don't mind being spied on.


every datapoint is a privacy concern

Yes, this.

33 bits.


> 33 bits.

Context: There are about 2^33 people on earth, so it takes roughly 33 bits of information to identify a single person.

(In practice, it's probably slightly more bits because not all bits carry unique information.)


33 bits is not nearly enough: https://www.innovationfiles.org/33-bits-of-nonsense/

>"Anonymity in real life is much different than anonymity in the lab, and most people are content to be “one in a million” even if they cannot be “one in 6.7 billion.” In any data set, highly unique individuals (i.e. the outliers) may stand out, much like today’s celebrities do not enjoy the same level of anonymity as the average citizen. However, the fact that some individuals may be identified in a particular data set does not mean that any (or all) individuals may be identified in the data set."


From screen resolution to mouse movement, everything can be abused to build profiles

That's correct. I want none of these available without my explicit consent.


What’s a valid use case?


If my site is getting hammered by visitors I would like to be able to easily discern if it's because I'm featured on HN's frontpage or if I'm victim of a DDOS attack.


The referrer header is in no way a tool to differentiate real users from a ddos attack.


I disagree. A fake referer is easily checked: Is my link really on the frontpage? If so: all good. If not: it's getting suspicious.


A similar line of argumentation has been historically used to push every outrageous thing on innocent people since forever. You sell the "abuse" as defense for a shocking crime. Ok, you only said DDoS when the usual is terrorism and child abuse. But the bottom line is the same: I need to take something private from you to defend myself.

What would you think if all stores took every measurement they could about you without disclosing it and eventually justified it by saying "how else would I know you're not a thief"?


A referrer header is not an outrageous amount of information. It's the store-equivalent of asking "Where did you learn about us?" Taking it away would hurt smaller sites and do nothing against large companies and ad networks.


> A referrer header is not an outrageous amount of information.

But it does reveal information that is none of the website's business.

> It's the store-equivalent of asking "Where did you learn about us?"

No, it's not. Actually asking that question would be the equivalent. What this is is surveillance.


The store is asking, the site is not. And 99% of people are trained to click "Accept" after years of dark pattern abuse and they have very little understanding of what happens in the background. I hope you understand that my point isn't to bash a webmaster but rather bring in discussion the principle of the whole thing. Seems that everybody draws the line for what is acceptable in such a way that it perfectly covers their own needs.

I've seen people that insist that using facial recognition is not different from what humans are doing naturally, now done also with electronics. We can agree the implications are different.


  You sell the "abuse" as defense for a shocking crime.
This works the other way around too. You use the abuse of non-personally identifiable information (by combining it with other data points, illegal without consent in the EU) to take useful data away from innocent webmasters.


> to take useful data away from innocent webmasters.

Webmasters who are collecting data about me or my machines (excluding the data about my direct use of their site) without my permission are not "innocent webmasters".


I'm surprised that in 2019 people (especially on HN) still believe/claim that users trying to hang on to their personal data "abuse" this to "take useful data away from innocent webmasters".

There are dozens of real life situations where covertly collecting such data would be considered completely unacceptable and yet my comment arguing this was still substantially downvoted.

But I guess my point is being in a technically literate community makes no difference when it comes to making a buck. Once one agrees to take a "not an outrageous amount" of private data for a bit of money, they'll agree to take an outrageous amount for outrageous money. And I think this is a perfectly accurate explanation for what FB, Google, [you name it] are doing.


Doesn't your argument work against encryption just the same? With such an argument aren't you actually punishing 99.9% of the internet population for what the 0.1% is doing?


But in general it's the only way to understand who's linking to you. Sure, not essential, but useful to see in general, especially when search engines could send it and you could see what keywords people used to find your site. If it were gone, as it is in many cases now due to https, people will adjust.


  "as it is in many cases now due to https"
That's not exactly true. Referrer is only hidden if it's explicitly asked by using a meta tag:

  <meta name="referrer" content="no-referrer" />
Or by using Referrer-Policy:

  Referrer-Policy: no-referrer
The default behavior is no-referrer-when-downgrade. This means that referrers from https to http are hidden. But https > https is still visible. And with https adaption reaching saturation, referer headers are usually still sent.


Google has used encrypted search terms in the referrals for years now.


Cross-origin sending of the Referer header can be disabled in Firefox with network.http.referer.XOriginPolicy, along with a variety of other Referer-related options [1]. I have it set to 1 (and XOriginTrimmingPolicy to 2) and haven’t experienced (m)any issues.

[1] https://wiki.mozilla.org/Security/Referrer


There are very goods reasons for the Referrer header to be used. If you see a lot of traffic going to a URL with a typo, you will want to know where that typo is. If someone hotlinks to a large file in your domain, you will want to know who it is and block it. Any alternative would be much more intrusive.


> you will want to know where that typo is ... you will want to know who it

I know you want to know those things. Find another way to handle those issues.

To be a "good reason", you need to show why your reason is worth paying the high price of betraying every user's browsing path to every server. Worrying about hotlinks and typos... "ain't the same fuckin' ballpark, it ain't the same league, it ain't even the same fuckin' sport".

> Any alternative would be much more intrusive.

Did you consider only serving that "large file" only when accompanied with a proper session cookie created when they loaded the HTML file? There are many solutions to those problems, including some that are sever-side-only.


  Find another way to handle those issues.
If there's another way, it would lead to the same privacy concerns.

  why your reason is worth paying the high price of betraying every user's browsing path to every server
First explain why it's a) betrayal b) a high price.

   a proper session cookie
This again could lead to privacy concerns.


I understand your concerns very well, but I have a different perspective. I don't modify my Referrer header. I want to let the websites I'm using where I came from. A referrer by itself is innocuous - only when you combine it with other nefarious techniques it wreaks havoc on users privacy. But on it's own, in an anonymous browser environment that I tend to use, it's actually quite useful.


Sadly, the industry chose to abuse it to the detriment of the users. Enough reason to take it away.


It's not really "design"--the header name is even misspelled. This one always felt like, at the time in the early days of the web, it'd be interesting data to pass along. Since then, things like image hotlinking started to depend on it, and Google got better about hiding referrer data, so there wasn't the same motivation to fix it as implementing same-origin policy. If the web were invented today, yes, I doubt that this would be a thing.


It was how you did sessions before Cookies and JavaScript existed, and existed because it was a problem that needed solving. Converting forms to wizards and the first Internet shopping carts.


I agree. There are some add-ons that spoof/disable this header for you, but as you said, this breaks some sites. I agree, as a consumer, that website that rely on the header are out of luck with regards to my business, but at work I don't always have a choice with regards to which online tools we use. But white listing the things that break is a fine solution in that case.


I forge the referer as the root of the site, except in the case of news sites that allow referers from google news to bypass the paywall, in which case I always forge that. This very rarely breaks anything (one out of a million sites expect an external or specific referer.)


and I wanted to recall that google employees repeatedly removed chromium's project code to restrict or disable referrer headers.

I personally was involved in 3 distinct times. And after that gave up chromium and the lie of google-independence completely.

and so should you. If they tweak things to reach their profit goals, they will also do the same when any agency "asks" them to. it's a slippery slope, and they already crossed




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: