Invisible Click Tracking Using “Empty” UTF-8 Characters

thegeomaster · on Feb 14, 2015

Just to point out that hovering over the link shows URL-encoded codepoints, i.e. %u200B (in Firefox, for me, at least). I do think it's a better solution than the existing approach with GET parameters which does look clumsy.

detaro · on Feb 14, 2015

Interesting. Chrome on Windows doesn't.

mschuster91 · on Feb 14, 2015

Here neither. But even then, for very long URLs (can easily happen with that SEO crap), just hide your tracking byte so that when Chrome/FF truncates the URL for display, the bytes will be inside the hidden area.

FoeNyx · on Feb 14, 2015

the %u200B is not being displayed while hovering the link by Firefox 35.0.1 / Linux (but it is displayed with beta 36 on the same computer)

kevinchen · on Feb 14, 2015

Safari appears to do the same. Version 8.0.3 (10600.3.18)

shared4you · on Feb 14, 2015

Nitpick:

> The idea is to use zero-width and space characters of UTF-8 such as U+200B, U+200C, U+200D

That must be Unicode, not UTF-8.

konstruktors · on Feb 14, 2015

Fixed it, thanks!

mrgriscom · on Feb 14, 2015

Isn't this exploiting the same kind of Unicode ambiguity that allowed phishing sites to impersonate trusted domains by substituting certain latin characters with identical-looking cyrillic equivalents? I would expect this capability to last long in the wild.

konstruktors · on Feb 14, 2015

Interestingly, none of those "zero-width" characters are allowed in IDN domain names http://unicode.org/faq/idn.html#22

_jomo · on Feb 14, 2015

These domains are displayed in their punycode notation, at least in Firefox, so that doesn't work anymore.

mrgriscom · on Feb 14, 2015

would->wouldn't

imjustsaying · on Feb 14, 2015

Anyone have a suggestion of how this would be used in practice for uniquely identifying something?

So there are a certain number of non-width space characters. As far as I can find in the links in the OP, U+FEFF, U+180E, U+200B, U+200C, U+200D would make 5.

So we have at least 5 values to work with, which would make... 120 combinations if they're ordered differently? Surely we would need more if we want to uniquely identify something such as a referral, or are there more non-width spaces?

I'm kind of dumb and bad at probability, also. You're encouraged to correct my thoughts on this and show me the errors of my thinking.

aidos · on Feb 14, 2015

Surely you could put as many of them in there as you wanted? So you have an alphabet of 5 characters to make an infinite number of words from.

imjustsaying · on Feb 14, 2015

Good point. 0-9 work the same way. Glad I asked :)

Tuna-Fish · on Feb 14, 2015

120 are the combinations if you only use 5 characters and just modify the order. However, if you allow the amount of characters to vary, and allow duplicates, you can represent any number you want. As it's a 5-bit number system, the count of distinct possibilities is 5^n for amount of numbers, or with just 10 digits, you can fit any 32-bit integer there. Plenty good enough for tracking.

ncza · on Feb 14, 2015

55555 if you limit to a 5 character word!

aidos · on Feb 14, 2015

You have 5 choices for each character. If you have limit to 5 character words you can assume that all 5 char, 4 char, 3, 2, 1 and 0 char words are included too.

I get:

    5^0+5^1+5^2+5^3+5^4+5^5 = 3,906

But I may have totally missed something.

Edit: looking a little closer I see some italics in your 55555 so I guess you have 5 * 5 * 5 * 5 * 5

ncza · on Feb 15, 2015

Ah, I missed the 4-1 length ones. And yes those were *s

boscomutunga · on Feb 14, 2015

i don't see any variables in the url when i run the inspector.

konstruktors · on Feb 14, 2015

Have a look at the request URL in this screenshot: http://kaspars.net/wp-content/uploads/2015/02/invisible-clic... Notice the escaped character there.

goldenkey · on Feb 14, 2015

There aren't any variables, just a couple unicode characters after the word 'inspector'

And this post doesn't link to the actual tracked url, the page contains it.

This is the actual URL: http://kaspars.net/blog/web-development/invisible-click-trac...

falcolas · on Feb 14, 2015

Copy and paste the link into python, like so:

    >>> repr('http://kaspars.net/blog/web-development/invisible-click-tracking')
    "'http://kaspars.net/blog/web-development/invisible-click-tracking\\xe2\\x80\\x8b'"

EDIT: Yeah, they didn't show up anywhere in the Safari inspector for me either.

rwmj · on Feb 14, 2015

I'm not seeing this in Python either. Perhaps the URL on the article has been "fixed"?

Kiro · on Feb 14, 2015

I do.

theunixbeard · on Feb 14, 2015

So who would pay for a tool to make creating/handling/analyzing these links as easy as using UTM parameters?

Mahn · on Feb 14, 2015

It's not that big of a deal to have these fields visible in the first place. The post is a nice trick, but I doubt it has a market for a startup.

jkot · on Feb 14, 2015

There are only a few chars, so urls grow huge, once click count increases. JS is probably better

cturhan · on Feb 14, 2015

Post as many as character you want and it will work. Here:

http://kaspars.net/blog/web-development/invisible-click-trac...

belorn · on Feb 14, 2015

Does this also work in mail readers?

_jomo · on Feb 14, 2015

I tested this in Thunderbird by subscribing to the blog's feed. It didn't show the characters when hovering although Firefox does.

konstruktors · on Feb 14, 2015

Not sure, haven't tested it.

andybak · on Feb 14, 2015

I actually like this. Assuming you agree there are non-creepy reasons to track clicks (not a given on HN) then this lets you keep your urls nice and clean when people just want to copy and paste them.

I imagine it might break in some scenarios (the url ends up with junk in it from bad unicode conversion) but it's up to you to be permissive in the urls you accept.

matt_morgan · on Feb 14, 2015

There are non-creepy reasons for click-tracking--e.g., you're a nonprofit supporting a cause, trying to identify preferences of your supporters. But caution: it might become creepy when you /appear/ to be hiding the tracking from your supporters. Why not give them readable URLs and be totally upfront about what you're doing?

x1798DE · on Feb 14, 2015

I don't think creepy stuff is less creepy if you're not doing it for profit.

userbinator · on Feb 14, 2015

I usually remove trailing spaces from URLs, and a lot of auto-linkifying processes (if they're Unicode-aware) will likely strip them too, so I predict this method won't be robust enough to survive a lot of the handling that URLs are often subject to.

If I'm filling out some form fields with URLs, I strip trailing (and leading) spaces too.

onion2k · on Feb 16, 2015

These are zero width characters. There's no reason why they have to be at the end of the URL.