‘One important trend to notice is how often Google Analytics, Google’s DoubleClick, Facebook, and Twitter are ingesting the user emails — these are organizations that should be receiving deletion requests en-masse and they should all have processes to handle this type of effort already (Facebook likely has this tech already based on conversations on this research and additional research from a private report from several years ago).‘
‘This type of email user data in a URL bar synced into Javascript pixels is most typically blocked by a regular person through “Ad blockers” or through browsers like Safari, Brave, and Firefox — those browsers use Javascript/cookie blocking as a default features to protect users (each browser handles it slightly differently). This breach and research included here would impact all Chrome users of these websites who went through these specific user flows and who didn’t proactively block all Javascript (a rarely used option) or use a Chrome “Ad blocker” extension that blocked this type of Javascript. Some people using the other “safe” browsers (Safari/Brave/Firefox) could have been protected from the leak due to their 3rd party Javascript requests being blocked.’
Original title too long. It was: The 2020 URL Querystring Data Leaks — Millions of User Emails Leaking from Popular Websites to Advertising & Analytics Companies
I would agree with you. Those aren’t my words, just two quotes from the article with some relevant info. As others have mentioned, it is email addresses that have been leaking and continue to leak.
This may be the first time I've seen an article try to sensationalize webhooks and third-party APIs. When I read the headline, I was expecting some kind of hack, not a story about how Dave in IT hooked the contact us form up to the CRM using webhooks and Zapier...
The meat in the story, is a real problem - irresponsible mingling of PII in analytics data.
The amount of times I've heard "Well, it's just analytics data, it's public anyways" from people just drives me up the wall. No, it's not public, it's still PII, you still have to guard it correctly!
It is very hard to make people understand that scale matters when deciding how sensitive data is. They only ever care about it in whatever narrowly defined use case they're worried about. The idea that someone can take an element of not-particularly-sensitive data from you, combine it with elements of not-particularly-sensitive data from elsewhere, and end up with a database full of extremely sensitive data simply does not click.
This covers query strings, but in theory, if you've got any third party JS on your registration/private page, that JS can get the contents of the form and exfiltrate it.
So, basically any 3rd party analytics has the ability to do this, query string or no?
Each email adhering to some rule (magicmarker[a-f0-9]+)goes to my account.
Each registration anywhere gets unique email address generated as magicmarker<b64(hash(domain+salt))>@mydomain.com
Salt is there to keep it unguessable.
When I get any spam, I can redirect it to /dev/null and verify from where it came from to sent hate mail to domain owner or whatever.
0 spam. 0 tracability. Ability to track who sold/leaked my mail address.
If others reading this don't want to run their own mail server, other services offer the ability to generate suffixes (a la `+SignupPageFoo`) via other characters.
The one I use (purelymail.com) allows for underscores to serve this purpose, which will never be stripped. It's also super cheap (less than $1/month), though because of its low volume and AWS IP, my messages have a problem with getting marked as spam. The next comparable service I found that offered effectively infinite addresses was $50/year, so I fall back to gmail if I really need to send email.
This is not adequate. The + character is very well known part of standard and it is very simple to remove it, one search and replace would do on whole email list.
The + character is only "part of standard" for gmail and whoever else chooses to copy them. This is why I mentioned that the provider I'm using allows _ to be used the same as +, which will not be stripped or replaced by anyone ingesting email addresses.
That + symbol is part of a pseudo-regex; it’s not part of the email address. You’re probably thinking of the email+whatever@whatever syntax, which isn’t what’s being described here. There’s nothing to strip in this case.
According to the RFC ‘+’ is a valid character and tagged addresses (which aren’t mentioned) represent unique users (so removing the marker could result in the email not being delivered at all.)
IME: I’ve never had anyone remove it (although some don’t accept it and others will have bugs that result in an unusable account if you register with it.) I used one with the company I most recently interviewed at (they used a third party service for the HR site and I’ve had trouble with sites like that selling my email address which is just so enraging honestly.) Every. Single. Time. I logged in to fill out a form after getting hired I had to call HR and have them reset my account because of some weird bug.
According to the RFC ‘+’ is a valid character and tagged addresses (which aren’t mentioned) represent unique users (so removing the marker could result in the email not being delivered at all.)
Though similarly, email usernames ('local part') are case sensitive but I've never encountered a mail server where this was the case. I imagine if a nefarious party stripped the markers, they'd lose the tiniest percentage of their audience that way.
(Aside: Your own approach to using + is quite clever as it's the total opposite to most users, so you'll see who pulls this trick ;-))
The trick is to forward emails with no marker to spam. Sure they can strip markers but I assume they don't replace markers to match my "canonical" email address.
You will have to elaborate, I don't understand. Marker is here just to handle mail on server side and I can easly upgrade it to hash of hash + salt and handle it programatically on server but there was never any need for it. And anyway I couldn't care less, without it mail is invalid, still 0 spam. And I am doing it for last 10 years so it is battle proven.
I think the responder misread your message and assumed you were adding a suffix like \+\w+ to the end of your email addresses like
name@gmail.com -> name+suffix@gmail.com
Several email providers treat a + symbol as the end of the first part of the address and ignore everything between it and the @.
I think the responder’s point was that analytics providers just ignore things after a + too
Product managers are seeing "longer time on site" in their analytics reports and keep adding more things thinking it is meeting the company's quarterly OKR
they don't know the "higher engagement" is because the mobile user's browsers are literally frozen
and the A/B test says "keep going with the B test!" "do it again!" in a tree that keeps evolving down one side of the graph towards more and more obnoxious experiences that the company doesn't even know is obnoxious
given the misaligned incentives I think this is also an area California can regulate or threaten to regulate, I don't like "tech regulation" but I can't think of any other party to curb the behavior. If you like "private sector solutions" more than "government solutions" then Apple and Google can pull the rug under all the other company's feet by crashing sites on the user's phone using other user's crowd sourced data, or making certain analytics packages not run, etc.
yes the bug of product managers following A/B tests blindly and not knowing its a bug, resulting in a worse and worse internet browsing experience for all of us
Yeah, but probably with a specific browser/os configuration, on high speed office internet.
Pretty common to find large websites that just happen to not work right in browser configurations that don't match their developer configuration even if they aren't obscure.
I am working as a data analyst. We had some cases were we were called to fix issues from other agencies.
The clients had forms data being sent as get-requests and from there email addresses and even more personal data in the URL (street, date of birth, and even more) was being transmitted into the analytics tools and also into marketing tools.
Regarding GDPR this is a breach and needs to be communicated to officials as well as the people affected.
Even a bank was affected by this type of implementation when customers wanted to open an account or make a loan application.
HTTP 101: do not transfer anything you don't want 'cached' as a GET request. Not only that, but some browsers will pre-emptively send GET requests or retry them so you'd have the double headache to worry about duplicate requests on the server-side.
It shouldn't require much experience to know when to use POST or some other HTTP verb - banks certainly have no excuse.
> Marketing 101: User actions should take as little clicks as possible, so the action should be performed as soon as the user clicks the (GET) link.
Nope, some email clients might prefetch urls in email for various reasons. You should absolutely NOT do this (unless you are decitefully trying to game you engagement metrics.) The only case where you might be able to get away with it is when the user has an active login session that you can verify prior to performing the action.
In the case of email, the sender already knows your email address and so should have no need to put it in the URL. The URL should only have some long random or pseudorandom identifier that has no meaning to anyone but them.
Everyone in sales is using the leaked emails. I have so many one-time emails that leaked (haveibeenpwned.com) and someone was smart enough to just use those leaked databases and sell the emails to sales departments.
I always ask the sales person where the hell they found the email because I just used it once somewhere long time ago.
I have a bit of a contrarian view on data on the web. I think, eventually any data on the web is going to be in the public domain at some capacity. Data will be everywhere and readily available, mostly for free.
For many of the examples described in the article, it is the site (not Google or Facebook) that left the email address in the URL. That is bad coding practice. Just so we are clear, it isn’t the “evil advertising companies did this”. Now asking Google to randomly search all unwanted referral URLs received for a customer specific pattern and delete what may look like an email address seems unfair. Google in no position to use or recognize that data as email. If they tried that, it would be brittle and unmanageable.
One could argue Advertising shouldn’t exist or that Google should not store anything. But the GDPR argument is BS, although admittedly legal.
It is like throwing a small rock in to the neighbors yard and asking them to retrieve it for you.
Google will also ban your GA account if they find that kind of information showing up. They aren't dumb and know exactly what they can and cannot get away with.
Emails on query strings are leaks that should be patched. That said you can bet these companies are sending emails over formal integrations to tons of 3rd parties for analysis, targeting, advertising, etc.
CCPA is not nearly as strict as the GDPR and it is not illegal, unfortunately.
This is the most accurate assessment. Considering how "not secure" email is in general and how easy it is for this information to be passed around behind the scenes this is almost a non-story.
I feel this article really stunk of an attempt to over-sensationalize some sloppy coding that is probably happening on 50% of the websites in the world. To think otherwise is nothing but a utopian view of reality.
Yeah, sure, in casual converstion, but in this case I read half the thing before it was clear what they meant, and it makes a big difference because leaking email addresses isn't the same thing as leaking emails, eh?
(BTW I think it sucks you're getting hammered by downvotes FWIW I upvoted you just to counter balance them.)
Sure it's valid but that doesn't mean it's not ambiguous. All it would take is an initial clarification (once near the top) that they're using "email" to mean "email address" and not "email content" and then we can all give this the correct level of attention.
I'm not for a moment suggesting this isn't bad but if you run about warning people of something important it helps to be clear and not risk exaggeration / ambiguity (or you play into downplayera hands and potentially waste the time of honest folk)
sure; it's fine for colloquial use, but terrible for a headline where the term is ambiguous, and the more common meaning carries a more sensational implication (that is wrong).
They are leaking email _addresses_ not emails. I was irritated how they manage to leak emails in a context where no emails are involved (but email addresses are).
I which people would be a bit more clear in the language they use, especially if it's about vulnerabilities.
You are trying to impose your own internal representation of email as a concept on others here.
To write "email" is correct. It would be more precise to write "email address" to make the distinction from "email message." But it is not the cultural norm to equate "email" with "email message" as you seem to do.
Juuust kidding. My point is: they probably do not have the used more clear language, because their internal concept of what email is is so fuzzy. That would be my guess anyway.
‘This type of email user data in a URL bar synced into Javascript pixels is most typically blocked by a regular person through “Ad blockers” or through browsers like Safari, Brave, and Firefox — those browsers use Javascript/cookie blocking as a default features to protect users (each browser handles it slightly differently). This breach and research included here would impact all Chrome users of these websites who went through these specific user flows and who didn’t proactively block all Javascript (a rarely used option) or use a Chrome “Ad blocker” extension that blocked this type of Javascript. Some people using the other “safe” browsers (Safari/Brave/Firefox) could have been protected from the leak due to their 3rd party Javascript requests being blocked.’
Original title too long. It was: The 2020 URL Querystring Data Leaks — Millions of User Emails Leaking from Popular Websites to Advertising & Analytics Companies