Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Invisible Click Tracking Using “Empty” UTF-8 Characters (kaspars.net)
114 points by konstruktors on Feb 14, 2015 | hide | past | favorite | 36 comments


Just to point out that hovering over the link shows URL-encoded codepoints, i.e. %u200B (in Firefox, for me, at least). I do think it's a better solution than the existing approach with GET parameters which does look clumsy.


Interesting. Chrome on Windows doesn't.


Here neither. But even then, for very long URLs (can easily happen with that SEO crap), just hide your tracking byte so that when Chrome/FF truncates the URL for display, the bytes will be inside the hidden area.


the %u200B is not being displayed while hovering the link by Firefox 35.0.1 / Linux (but it is displayed with beta 36 on the same computer)


Safari appears to do the same. Version 8.0.3 (10600.3.18)


Nitpick:

> The idea is to use zero-width and space characters of UTF-8 such as U+200B, U+200C, U+200D

That must be Unicode, not UTF-8.


Fixed it, thanks!


Isn't this exploiting the same kind of Unicode ambiguity that allowed phishing sites to impersonate trusted domains by substituting certain latin characters with identical-looking cyrillic equivalents? I would expect this capability to last long in the wild.


Interestingly, none of those "zero-width" characters are allowed in IDN domain names http://unicode.org/faq/idn.html#22


These domains are displayed in their punycode notation, at least in Firefox, so that doesn't work anymore.


would->wouldn't


Anyone have a suggestion of how this would be used in practice for uniquely identifying something?

So there are a certain number of non-width space characters. As far as I can find in the links in the OP, U+FEFF, U+180E, U+200B, U+200C, U+200D would make 5.

So we have at least 5 values to work with, which would make... 120 combinations if they're ordered differently? Surely we would need more if we want to uniquely identify something such as a referral, or are there more non-width spaces?

I'm kind of dumb and bad at probability, also. You're encouraged to correct my thoughts on this and show me the errors of my thinking.


Surely you could put as many of them in there as you wanted? So you have an alphabet of 5 characters to make an infinite number of words from.


Good point. 0-9 work the same way. Glad I asked :)


120 are the combinations if you only use 5 characters and just modify the order. However, if you allow the amount of characters to vary, and allow duplicates, you can represent any number you want. As it's a 5-bit number system, the count of distinct possibilities is 5^n for amount of numbers, or with just 10 digits, you can fit any 32-bit integer there. Plenty good enough for tracking.


55555 if you limit to a 5 character word!


You have 5 choices for each character. If you have limit to 5 character words you can assume that all 5 char, 4 char, 3, 2, 1 and 0 char words are included too.

I get:

    5^0+5^1+5^2+5^3+5^4+5^5 = 3,906
But I may have totally missed something.

Edit: looking a little closer I see some italics in your 55555 so I guess you have 5 * 5 * 5 * 5 * 5


Ah, I missed the 4-1 length ones. And yes those were *s


i don't see any variables in the url when i run the inspector.


Have a look at the request URL in this screenshot: http://kaspars.net/wp-content/uploads/2015/02/invisible-clic... Notice the escaped character there.


There aren't any variables, just a couple unicode characters after the word 'inspector'

And this post doesn't link to the actual tracked url, the page contains it.

This is the actual URL: http://kaspars.net/blog/web-development/invisible-click-trac...


Copy and paste the link into python, like so:

    >>> repr('http://kaspars.net/blog/web-development/invisible-click-tracking')
    "'http://kaspars.net/blog/web-development/invisible-click-tracking\\xe2\\x80\\x8b'"
EDIT: Yeah, they didn't show up anywhere in the Safari inspector for me either.


I'm not seeing this in Python either. Perhaps the URL on the article has been "fixed"?


I do.


So who would pay for a tool to make creating/handling/analyzing these links as easy as using UTM parameters?


It's not that big of a deal to have these fields visible in the first place. The post is a nice trick, but I doubt it has a market for a startup.


There are only a few chars, so urls grow huge, once click count increases. JS is probably better


Post as many as character you want and it will work. Here:

http://kaspars.net/blog/web-development/invisible-click-trac...


Does this also work in mail readers?


I tested this in Thunderbird by subscribing to the blog's feed. It didn't show the characters when hovering although Firefox does.


Not sure, haven't tested it.


I actually like this. Assuming you agree there are non-creepy reasons to track clicks (not a given on HN) then this lets you keep your urls nice and clean when people just want to copy and paste them.

I imagine it might break in some scenarios (the url ends up with junk in it from bad unicode conversion) but it's up to you to be permissive in the urls you accept.


There are non-creepy reasons for click-tracking--e.g., you're a nonprofit supporting a cause, trying to identify preferences of your supporters. But caution: it might become creepy when you /appear/ to be hiding the tracking from your supporters. Why not give them readable URLs and be totally upfront about what you're doing?


I don't think creepy stuff is less creepy if you're not doing it for profit.


I usually remove trailing spaces from URLs, and a lot of auto-linkifying processes (if they're Unicode-aware) will likely strip them too, so I predict this method won't be robust enough to survive a lot of the handling that URLs are often subject to.

If I'm filling out some form fields with URLs, I strip trailing (and leading) spaces too.


These are zero width characters. There's no reason why they have to be at the end of the URL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: