The thing that concerns me about the FB Pixel (and GTM) is that the host is completely free to do anything and everything to the page. Even if they don't do anything "evil" today, tomorrow is a different story completely. This scares the pants off of me and makes me want to rip out any "tracking" that I've ever installed on any site anywhere. Actually, that's probably not a bad idea.
Are there no browser level protections for this type of thing? I thought CORS was supposed to prevent these activities from happening.
Virtually all tracking boils down to 1x1 sized images getting embedded on the page, with various metadata being attached in that image call. The javascript libraries may include other functionality (like additional fingerprinting and such), but are primarily just convenient abstractions that generate and embed the the tracking images for you. Most provide the details needed[1] to build your own generator function, which would allow you to integrate the tracking you want while reducing your security exposure to third party code.
As for GTM – a deployed container is self-contained. If you don't want to expose your site to third party code, but want to use GTM as a convenient control plane for configuration of tags and tagging rules, you can do that. Instead of using the standard snippet that loads the container from Google, you can just grab the generated javascript file for the container after a new deploy and self-host it. It gives you the convenience of GTM (central control plane for tagging-related stuff, versioning and commenting, etc) but without the security exposure of embedding externally hosted scripts.
The actual 1x1 pixel is a leftover from the previous generation tracking tools, and even the page you liked to recommends _against_ using that method because it can’t spy on users enough.
Here we are talking about a tracking _script_ embedded in the page and sending to Facebook everything the user does (“standard or custom events triggered by UI interactions”).
Using only a pixel to track how users move around the app wouldn’t have landed Backblaze in as much hot water. Instead, it looks like the Facebook _tracking script_ (automatically) exfiltrated sensitive data like file names, and that crosses a limit.
It's not a leftover – the core premise of how these scripts work use the exact same principle. Even when using the JS tracking library, if you look at the network calls to Facebook after the initial script download, they're all hits to https://www.facebook.com/tr/ with the metadata for the call in query parameters and a return an image content type (image/gif).
As I mentioned in my original comment, the tracking scripts are more than just generator functions for the image pixels. They also do stuff like browser fingerprinting and cookie management[1], and ensure these things get tacked onto generated pixel calls. This improves the fidelity of the data sent back to Facebook, but ultimately it all boils down to image calls with tracking data tacked on as query parameters to the call.
The reason Facebook (and others) don't recommend doing this is because
– As you mentioned, they have way more freedom to do what they want on the page when you load their actual script. So of course that's going to be their preference.
– Advertisers use these pixels for attribution purposes, but ad networks also use the opportunity to further fingerprint and profile users for targeting within their platform.
– The tracking script abstracts away the actual tracking protocol being used (i.e. the query parameters and their associated values). Which helps ensure calls are made correctly, as well as provides flexibility to make changes in the underlying protocol while retaining a stable interface via the JS SDK.
- Takes care of things like generating a unique user id, looking for and saving Facebook Click IDs when seen on incoming traffic, and tacking those values onto pixel calls when they occur.
Any user ID can actually be used, so long as it's unique (and Facebook's methodology is documented and easily replicated in [1], if you want to be consistent with the SDK). And persisting a query parameter into a cookie is actually more robust if done by a first-party script, since ITP has made the lifespan for cookies written by third-party scripts so short.
As long as your custom image generator accounts for those two components (generates a client id if none exists and persists + includes a fbclid if seen on incoming traffic), you will get close to parity with the JS tracking library as far as attribution in Facebook Ads without any need to load third party scripts from Facebook (or other advertisers). Which, as an advertiser, is the only part that you care about. What isn't at parity is all of the secondary fingerprinting that ad networks do, but that's the ad network's problem and preventing that shady shit from happening on your site is the precise reason you'd want to roll your own tracking calls to begin with.
As a first-party site owner, subresource integrity checks[0] (that someone else already linked elsewhere in this thread) lets you at least determine, at the browser request level, if a third-party script has changed since you installed (and hopefully audited) it.
For various reasons including this, advertising tracking is moving server-side, where the company can much more tightly control what gets sent to the vendors, and where third party JavaScript no longer has access to the DOM, network requests, or cookies.
The upside of third-party trackers is that you can completely block all of them by just blocking third-party javascript. What are we going to do once all of this tracking code starts getting served from the first party domain instead? Or even served inside the same source files as site code?
I imagine we will start seeing a new class of privacy extensions that behave more like anti-virus. Checking for known hashes of tracking scripts, monitoring for certain patterns of behaviour during execution.
The future is entirely server-side tracking, with no JavaScript executed in the client unless for UX tracking like Hotjar or A/B testing like Target or Optimize.
Personally, I haven't seen a desire in companies to skirt GDPR. Rather companies just want to be compliant and not have to worry about data breaches or reputational damage from their marketing tools. This example with Backblaze is exactly what companies are trying to avoid.
Are there no browser level protections for this type of thing? I thought CORS was supposed to prevent these activities from happening.