Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Preventing Tracking Prevention Tracking (webkit.org)
212 points by om2 on Dec 10, 2019 | hide | past | favorite | 74 comments


Some of the people I've talked with over the years study things like nuclear weapons arms control or cyberwarfare. The most paranoid of the bunch have resorted to having Virtual Private Servers screen shot websites with headless browsers once it loads and pipe it back to their research machine. I can't remember if it's a table of PNGs or just one big one, but either way it's sent back over a SSH tunnel and when you click the server knows what you're trying to click on and preforms the action for you, and will randomly forward the click to a new VPS.

It's not perfect because the IP blocks make it obvious that it comes from DigitalOcean, AWS, etc, but it's sure better than loading untrusted PDFs or JS locally. Still vulnerable to a network attack, though.


Sounds like Stallman

>I generally do not connect to web sites from my own machine, aside from a few sites I have some special relationship with. I usually fetch web pages from other sites by sending mail to a program (see https://git.savannah.gnu.org/git/womb/hacks.git) that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it

https://stallman.org/stallman-computing.html


How does this stop something as simple as user-unique URLs for each link? A new VPS that fetches a unique URL is trivial to tie to the same user.


open multiple browser sessions for the user, and randomly choose one of them as the 'result' (but still click on all of them, even if the resultant page isn't viewed).

Or, just don't use the website if they do this.


I keep thinking someone will reboot Opera's mini web browser for this purpose. (Their intermediate server renders the target website to an image.)

I also anticipate someone will do smart diffing on target websites to better auto nuke ads, trackers, etc.


Isn't that worse, a big brother in the middle watching everything and even doing TLS termination? Unless it's running on a Tor-like distributed system?


Much belated response, sorry.

I just don't know. I've stopped using VPNs for this very reason.


This type of tracking seems to assume the user is not bothering to send a fake Referer, e.g. she can just use the URL she is requesting, or just omit the header. One could argue such users are "low-hanging fruit".

Very few websites will vary the response if there is no Referer. Sending it really offers little benefit to the user.

Setting up a "headless" browser also seems like overkill. Firefox 57 and later has a -screenshot command line option which saves a PNG. No need to launch X11 for this to work.


Payment flows often require a specific referrer.


Solution: Send a Referer when making payments, i.e., when using the web for commerce.

No need to send one when using the web for recreation.


So they're taking screenshots via the VM console? Why not just directly interact with the VM console, then?


If they’re forwarding each click to a different VM to avoid persistent tracking then that wouldn’t work.



Why don't they use isolated laptops with only 4G access or dedicated external line?


If you want this in Firefox you need to tweak an about:config setting. I really hope it becomes the default at some point.

    # Only send the origin cross-domain.
    network.http.referer.XOriginTrimmingPolicy = 2
This alone is a pretty liberal policy. People in this crowd probably want even more which can be found here: https://wiki.mozilla.org/Security/Referrer


Why does this header need to exist in the first place? Seems like a huge privacy breach. Why can't 0 be the default setting?


I can't speak to why it was originally defined, but since the Referer [sic] header has existed for decades, many sites depend on it to function. The Smart Referer extension whitelist[1] and bug tracker[2] have several examples.

1. https://gitlab.com/smart-referer/smart-referer/blob/gh-pages...

2. https://gitlab.com/smart-referer/smart-referer/issues?scope=...


> I can't speak to why it was originally defined, but since the Referer [sic] header has existed for decade

I can remember my Dad getting a mail from someone he linked to that was about to move his website and politely contacted his neighbors on the internet to allow them to update their links.

Very useful at that time.


It can still be useful for that kind of thing. When I notice an unexpected spike of traffic on one of our sites I'll often look at our analytics to see where it came from and then potentially drop in there to answer comments and such. Not to say that's worth the privacy trade-off though, unfortunately.


Believe it or not, there actually exist websites that rely on the Referer header for navigation. The last time I bumped into this was a few years ago, but a local government site refused to work unless my browser sent that header.

Granted, this is probably rare enough that it's safe to disable the header for the vast majority of websites, but it's something to keep in mind.


Atlassian requires it for Jira (& other bits of their crap) logins to work.


I'm honestly not surprised.

Judging solely by the UI, I actually kinda like Atlassian's tools, but they're a huge pain in the ass to get working with privacy extensions installed (uMatrix, uBlock, etc.). They make cross-site requests all over the place (to weird servers like "some-huge-name-that-obscures-the-host-name.atl-pass.net", and even some third party servers!), tons of Javascript and css for basic features, etc. Using dubious features like referer headers seems right up their alley.

It's one of the main reasons I only use them at work, and won't use them for my personal projects. I'd rather pay for GitHub and Sourcehut so I don't feel like I'm opening my browser up to a bunch of security problems.

In the past they've also made some really brain dead (IMO) decisions like going out of the way to break middle-click paste on Linux.


>They make cross-site requests all over the place (to weird servers like "some-huge-name-that-obscures-the-host-name.atl-pass.net", and even some third party servers!), tons of Javascript and css for basic features, etc

If you like this, you should try Microsoft. They combine this crap with endless redirects. Usually, I give up after 5 minutes whitelisting + redirects.


Beyond what other people mentioned, some sites and frameworks also rely on the Referer header as part of CSRF protection. It's not truly necessary to check, but it's an OWASP recommendation so it seems like a decent number of places implemented it by default.

I recently got the Pyramid Python framework to make it possible to disable Referer-checking for the built-in CSRF protection, but they're still going to keep requiring the header by default: https://github.com/Pylons/pyramid/issues/3508

More discussion about it in these pull requests too:

https://github.com/Pylons/pyramid/pull/3512

https://github.com/Pylons/pyramid/pull/3518

The new version with it being optional hasn't been released yet, so as of right now almost everyone using Pyramid will still require users to send a Referer header to get past any CSRF checks.


I had an old website hosted under www. When it was decided to build a new website, to preserve the old content, the new site was built without a leading subdomain.

The problem was that chrome cached www as the default for anyone who'd visited the old site, and had started hiding www from the address bar.

I used Caddy to redirect all requests to the subdomain free site unless the request came with a referrer from that site, fixing the caching and allowing for free navigation between and within both the old and new site.


> Origin-Only Referrer For All Third-Party Requests

This is going to break a lot of things. Things that probably should be broken, but it will cause headaches nonetheless.


Luckily if a big browser makes this the default, these things will probably be fixed.


Conversely, if a big browser makes a new default that ends up being the wrong decision, that default might spread to other browsers and things will definitely be broken.

The css value `100vh` meant the height of the viewport of the browser, until it didn't.


> The css value `100vh` meant the height of the viewport of the browser, until it didn't.

Huh, what's it mean now? Is there some subtle difference, like it doesn't include the horizonal scroll bar or something?


Mobile devices interpret it differently because of the hide/show browser UI they often have.


I’m pretty sure that’s only true for Chrome at this point.


I hope there is a light at the end of the tunnel for all of this. It seems like there will always be a cat and mouse effort to be just one step ahead of the other. Like how many websites have those popups now where they ask you to turn off ad-blocking. Intrusive ads and website tracking should both be a problem by default. I guess not all ads can be a problem, but I am unsure if the same could be said about tracking...


We're willing to play the cat and mouse game indefinitely, if that's what it takes. Widely deployed trackers are limited in how fast they can try new tricks. And in practice, we know that ITP is working pretty well to block cross-site tracking: https://daringfireball.net/linked/2019/12/09/the-information...


> Widely deployed trackers are limited in how fast they can try new tricks.

How so? Tracking scripts are often included by a script tag that points at a website. Can’t the code be updated, “deployed” to websites immediately, and take advantage of the relatively slower release cycle of Safari?


Maybe I should have said that some tricks are slow to deploy.

Sometimes the publisher only embeds an image form the tracker (the famed "tracking pixel"). Getting lots of sites to change that to script is a pain. Sometimes they need to deploy new server-side tech for a workaround. For the recent CNAME cloaking trick, they have to get sites to modify their DNS and change what URL they embed script from.


You're doing good work, thank you.


2021: "Preventing Tracking Prevention Tracking Prevention Tracking"


[2019/12/12] [Hotfix] Pre-Emptive Tracking of Track-Preventative Tracking Users by Home Address


Whether it’s a light or not, the end of the tunnel is in sight, it’s the ads becoming the content.


This is so prevalent already. Brands disguised as users posting "content" that is mostly just an advert for their brand.

It has got to the point where any time someone posts something that seems to too clearly show a brand name or speaks too highly of a product I suspect its the PR people at work and I downvote it.


The old way: tracking you as you look for snowboarding videos on the Web and advertising you a snowmobile wherever you go.

The new way: making sure that 95% of the snowboarding videos you see are subliminally designed to sell you a snowboard (the guy riding the competitors snowboard goes slower and crashes... the guy riding your company’s snowboard wins the race and his girlfriend looks like a supermodel)

I think eventually we will pine for the old way. Already you can’t get useful reviews anymore because all of the “comparison” searches are run by manufacturer mouthpieces.


>I think eventually we will pine for the old way. Already you can’t get useful reviews anymore because all of the “comparison” searches are run by manufacturer mouthpieces.

Absolutely. A lot of reviews these days from google results read like someone who has only ever read the feature list from the marketing page. There is a bit of a search engine hack where you just put "reddit" after any search and it brings up fairly real results for now.


I find reviews useful anyway. I simply ignore the "good" reviews and always look at the worst ones. There are three kinds of bad reviews - people who had random bad stuff happen (postal service broke it) that is irrelevant and think everyone should know - people with some sort of vendetta (possibly disgruntled employees, or competitors, or crazy customers) - and finally, people who actually had a bad experience that might be characteristic of the product's quality or design.

If the third category can be used to construct a narrative about something that is a deal-breaker, then that's the information I'm looking for. Of course, it has to be taken in context of the competitors.

My expectation is that the best products have some type (I) and type (II) bad reviews, but no type (III). Almost as good is something with type (III) that are about something that either doesn't matter to me or is actually a positive from my perspective.


I'm pretty good at finding decent reviews. I'd never post my process on a public forum, but I apparently don't have as much trouble avoiding sponcon as a lot of others.


If people are doing that, it’s fairly certain that there are marketers maintaining “humanoid” Reddit accounts which then chime in with opinions on Bluetooth headphones.


No doubt. So far the only defense against crap products is buying them from a physical store so you can test and easily return them and having a good warranty.


There’s a marketing term for it, too. It’s called “native advertising.”


"Native advertising" was originally ads that were served by the site itself, first-party. It's advertisers who have co-opted it (yet again), but that doesn't mean you have to go along with their preferences. Typing isn't hard and "sponcon" (sponsored content) is already perfectly cromulent jargon, as well as being shorter.


I've said before, and I'll say it again, much of the content I want to consume is basically advertising, but the way today's internet works, is that I have to view ads for stuff I don't want in order to see the ads I want to see. And for some reason, people call avoiding this "stealing".


And how much real functionality will be sacrificed to this war?


>> Like how many websites have those popups now where they ask you to turn off ad-blocking

Handle them the same way as websites with a cookie or gdpr warning that blocks everything else, vote with your valet by leaving the site and find another site instead


> ITP now downgrades all cross-site request referrer headers to just the page’s origin

What is meant by cross-site here? Does it mean a different eTLD+1, or a different origin (as used by CORS)?

Specifically, if I make a request from https://www.example.com/path?query to https://api.example.com will the referer header contain the "/path?query"? or will that get blocked as well?



So what's next? Tracking the Prevention of Tracking Prevention?

Honestly, this shit gets confusing, can someone please ML us out of it? Or maybe we just design a sane and understandable First-Party only policy?


It's impossible to build a perfect system, even ML could have a bias towards a certain solution or the badguys could ML a way to track us again.


Its funny how our brains have a kind of built in adblocker named banner blindness. There have been a few times I was unable to understand a UI because the important part was rectangle and too prominent so I ignore it entirely without realizing it.


Why do you think advertisers moved to moving ads, ads that fade in over the page once you scroll a little and can be assumed to be focusing on the page, reading? Autoplay video that moves down to the picture-in-picture corner? The more annoyingly distracting the ad is, the better. Or so advertisers think.


You might need a sarcasm detector.


An ML based one of course.


cat and mouse game because no software is perfect, yet


Intelligent Tracking Prevention uses a machine learning classifier.


Why can't a browser solve this (except for IP) by simply having an option to not leak any data? Make audio and GL calls constant time, and don't persist anything past the tab / window / site? No fonts or cache reuse beyond the host? No referrers etc.

What's the hard problem here that prevents major browsers from having an option like this?


Firefox has an about:config preference called "privacy.resistFingerprinting" to enable some of Tor's mitigations against fingerprinting. Tor is based on Firefox code and Mozilla merges some of Tor's code changes into Firefox to make updating easier for the Tor team.

More details in this ghacks article:

https://www.ghacks.net/2018/03/01/a-history-of-fingerprintin...


The difficult bit is that your browser is programmable and browsers are different across vendors, devices and releases. This means whatever a bad guy can think of as a test can be sent back over the wire, and you can't realistically block sites from sending data back to servers. So long as there are different browsers etc, there will be tests that can differentiate between them.

Currently canvas fingerprinting [1] is a popular option, but there's quite a lot of options for that next thing you can use. Even generic code execution time could be used to an extent. Realistically there is hope at the end of the tunnel, but it's a very long way to go given just how complex of a corner we've painted ourselves into with modern web standards.

While it wouldn't stop purely malicious actors, I personally think it might be easier to address the whole situation on the legislation side rather than with technology. Imagine GDPR, except tracking would be illegal altogether: there will always be actors who will work to bypass it, but the majority would do their best to conform, lest they want to go bust with fines.

[1] https://en.wikipedia.org/wiki/Canvas_fingerprinting


Of course, Google suggests modifications that would hinder their competitors, but not themselves. I wonder what percentage of browsers have a first-party cookie from Google?


If Google had any motive besides research and responsible disclosure, it would more likely be to persuade us that ITP is not viable. But I think their issue was fair and submitted in good faith.


> We’d like to thank Google for sending us a report in which they explore both the ability to detect when web content is treated differently by tracking prevention and the bad things that are possible with such detection.

Its interesting that Google being an ad-tech company is doing something against their own interest.


They have an app store they earn money from. They have ads system. For them websites are competition. Because people loose time elsewhere than in apps. They click other ads than theirs.

How should anyone believe these actions are for privacy? And not against competition? Against the Internet?

Have you seen any consideration how it will impact website owners? I didn't. It seems they really don't care. And it is very dangerous.

It looks like the path to break the Internet.


Apple doesn't have an ads system, unless you mean the App Store ads that only show up in App Store search (and thus are completely unrelated to Safari). They had an ads system at one point but it was shut down over 3 years ago, and was for in-app ads anyway, not browser ads.


Apple also runs ads in Apple News.


True, i was thinking about Apple and Google in the same time


Those are not mutually exclusive, both can be true


Why not both? The market is a brilliant system. Privacy is objectively good for users and so is competition as it can lead to lower prices and better products for users.


The word "can" is doing a lot of heavy lifting there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: