Just to point out that hovering over the link shows URL-encoded codepoints, i.e. %u200B (in Firefox, for me, at least). I do think it's a better solution than the existing approach with GET parameters which does look clumsy.
Here neither. But even then, for very long URLs (can easily happen with that SEO crap), just hide your tracking byte so that when Chrome/FF truncates the URL for display, the bytes will be inside the hidden area.
Isn't this exploiting the same kind of Unicode ambiguity that allowed phishing sites to impersonate trusted domains by substituting certain latin characters with identical-looking cyrillic equivalents? I would expect this capability to last long in the wild.
Anyone have a suggestion of how this would be used in practice for uniquely identifying something?
So there are a certain number of non-width space characters. As far as I can find in the links in the OP, U+FEFF, U+180E, U+200B, U+200C, U+200D would make 5.
So we have at least 5 values to work with, which would make... 120 combinations if they're ordered differently? Surely we would need more if we want to uniquely identify something such as a referral, or are there more non-width spaces?
I'm kind of dumb and bad at probability, also. You're encouraged to correct my thoughts on this and show me the errors of my thinking.
120 are the combinations if you only use 5 characters and just modify the order. However, if you allow the amount of characters to vary, and allow duplicates, you can represent any number you want. As it's a 5-bit number system, the count of distinct possibilities is 5^n for amount of numbers, or with just 10 digits, you can fit any 32-bit integer there. Plenty good enough for tracking.
You have 5 choices for each character. If you have limit to 5 character words you can assume that all 5 char, 4 char, 3, 2, 1 and 0 char words are included too.
I get:
5^0+5^1+5^2+5^3+5^4+5^5 = 3,906
But I may have totally missed something.
Edit: looking a little closer I see some italics in your 55555 so I guess you have
5 * 5 * 5 * 5 * 5
I actually like this. Assuming you agree there are non-creepy reasons to track clicks (not a given on HN) then this lets you keep your urls nice and clean when people just want to copy and paste them.
I imagine it might break in some scenarios (the url ends up with junk in it from bad unicode conversion) but it's up to you to be permissive in the urls you accept.
There are non-creepy reasons for click-tracking--e.g., you're a nonprofit supporting a cause, trying to identify preferences of your supporters. But caution: it might become creepy when you /appear/ to be hiding the tracking from your supporters. Why not give them readable URLs and be totally upfront about what you're doing?
I usually remove trailing spaces from URLs, and a lot of auto-linkifying processes (if they're Unicode-aware) will likely strip them too, so I predict this method won't be robust enough to survive a lot of the handling that URLs are often subject to.
If I'm filling out some form fields with URLs, I strip trailing (and leading) spaces too.