Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The URL shortener situation is out of control (hanselman.com)
166 points by yonasb on June 2, 2014 | hide | past | favorite | 78 comments


That's really unfortunate. And it's not just performance, it really messes around with OS-level URL handling protocols like Android intents (and possibly FB's app links and iOS's new Extensibility).

I recently found this happening with Twitter's Android app. The user sees a link to player.fm and thinks it will open the native Player FM app if they have it installed, since it's registered to handle that URL pattern. But instead, the OS offers web browsers and Twitter as ways to open the link, because it's not really a player.fm link as presented to the user, but a t.co link. If the user then chooses a browser, the browser immediately redirects to the correct URL, which then pulls up the intents menu again.

7 redirects could potentially be 7 popup menus for the user to navigate through.

The OS could pre-emptively follow redirects, but that would of course introduce considerable latency since normally the menu is presented without any call being made at all. Maybe the best solution for OSs is to present the menu immediately but still make the call in the background, so the menu could be updated if a redirect happens.

"I don't see any work happening in HTTP 2.0 to change it."

Probably the best HTML standard for dealing with it is the "ping" attribute which allows a way for servers to be notified of a click without actually redirecting. However, that's HTML and not HTTP, and these days, apps are more popular HTTP clients than browsers, and apps don't manually bother to implement things like that.

So there are probably things that could be done with the standard. Perhaps using some distributed lookup table to ensure at most 1 redirect (by caching the redirect sequence and returning it with the first request). That does ignore any personalisation that goes on, but generally these should be permanent redirects without personalisation anyway.


> I recently found this happening with Twitter's Android app. The user sees a link to player.fm and thinks it will open the native Player FM app if they have it installed, since it's registered to handle that URL pattern. But instead, the OS offers web browsers and Twitter as ways to open the link, because it's not really a player.fm link as presented to the user, but a t.co link. If the user then chooses a browser, the browser immediately redirects to the correct URL, which then pulls up the intents menu again.

You don't even have to go that far. Just click on a youtube link. First it'll ask if you want the www.youtube.com url to play in a browser or the app (which sucks), then it'll redirect to m.youtube.com and ask you again.

Only reason I haven't set it as my permanent choice is because I still hold out some shred of hope that the youtube app will be able to play an entire video without stopping for 2 seconds every 3 some day in the future.


For Android at least, I've been using this and I'm pretty happy with it: https://play.google.com/store/apps/details?id=com.teaandtoys...

It usually works, and when it does it's nice to fix the idiotic Twitter-android-app behavior.


Thank you - I've just installed this, and it's great!


Prefetching the URL will trash analytics, which is why most of this is really done. Its not like the short URLs are easier to copy/paste, or remember.


If user convenience requires trashing server analytics, then server analytics should be trashed.

Possibly even deliberately - enabling that by default automated way in commonly used products, so that this tracking becomes ineffective and useless, and there's no more motivation to insert these artifical layers of redirection.


Pretty sure I called this one a few years ago.

http://joshua.schachter.org/2009/04/on-url-shorteners


Yup! I remember that post. Still holds true


We could put a stop to marketing redirects tomorrow if we didn't allow redirects to set cookies.

(Or perhaps only allowed a cookie if the redirect was served by the same domain as the target domain.)


That will break just about every affiliate program that I'm aware of. Of course this is your intention but there will be a very large number of websites that will see their turnover plummet if that should happen.

I personally would not mind but I'm pretty sure that a lot of monied interests would not like to see this happen.


This would result in marketers returning 200 responses to set the cookie and render a page with javascript that sets `window.location` instead, which would be even slower.


I think this is worth hammering home.

People want analytics information, and the only way we can do this now is by adding things like this. (Not strictly true, but ease of deployment, etc).

Unless you shut down the only way to do something, the people who want this service will work around whatever restrictions have been put in place, and the solution we get will probably be even uglier.

The real way to get rid of this is to provide a mechanism that addresses everyone's desires.


the problem is that these analytics are only beneficial to the owners or advertisers, which means if there's a way for users to turn off the supposed capability, then the advertisers would find a different way to do it, which results in much the same thing as today.

it's a social problem, not a technological one imho.


How would that affect the single sign-on case? AFAIK it's common practice to issue redirects to people who are not authenticated to send them to the identity provider's (IDP's) login page. This would make it hard for an IDP to determine if the user already has an active session with them.


This still doesn't stop all use-cases.

Click count statistics, time-clicked, and geo-information can all be gotten without any cookies. Some sites use url shorteners just to see clickthrough statistics, which can always be determined with no cookies etc.


That could break a lot of bad-yet-benign code.


Yeah like half the login systems in the world?


Seems logical - only allow the endpoint to set the cookie.


I think the most practical solution to this, requiring only a change in practice and not in standard, would be for link shorteners to start doing HEAD requests on the urls they shorten and unwrap it to make their shortened link canonically correct if it results in a permanent redirect.

Yeah, there are things that might have some problems with this, but they're things that are probably somewhat abusive to the 301 status code to begin with.


This is what we do when users post to our site. Mostly because of spam though.


> Redirects are being abused and I don't see any work happening in HTTP 2.0 to change it.

I agree that this is an unfortunate pattern, but what exactly could the HTTP spec do to change it? The only thing I can think of is limiting the number of chained redirects, although I don't see browsers implementing that if longer chains are even remotely common.


Why do we need a technical solution. My understanding is that the author is arguing for a change in how URL shorteners are being used, not a technical change making this impossible. The problem is that once a technology exists, it will be abused. Sometimes this abuse is just a clever and useful hack, and sometimes it is annoying and anti-usable.


If I remember correctly, there was an old (very old) project with reversible links called Project Xanadu.

if my sketchy memory serves me, it was based around using a currency and updatable links. Along with that, the idea was that you could also share segments of movies and music with the hyperlink system.

I'm pretty sure it died a pitiful death due to it being completely secret until after HTTP got ingrained.


I looked at it before, and it was, indeed, neat. But, I'm not sure that a project that spent 30 years in development before an initial release can really be said to have "died".


I think the other issue is that these aren't being used as URL shorteners any more (in the sense they were when they were used for Twitter's 140 character limit). They are tracking URLs, gathering data about you at each hop.


If the HTTP spec added 2 new VERBS (SHORT, LONG) as a method of shortening and elongating URLs then many things could be done.

1.) The browser could pro-actively lengthening the URL and the same way the server can respond 302/301 now the browser could cache this. 2.) The server could hand-back the final long URL with out needing to redirect the URL multiple times 3.) We could create services that can be integrated into the server software that integrate 3rd parties. 4.) Each domain could create their own shortened URL domains and mask it in a better way.


1) The browser doesn't know the long URL so how can it proactively lengthen it? 2) The server might not know the long URL since all that t.co knows about is the slate.me URL and that only knows about the slate.tribal URL and it only knows the goog.le url and so on and so forth. So this would not be possible unless it was only 1 hop.

3)I am assuming the services you want to integrate into the server software will resolve the shortened URL into a long one or vice versa but in case there are multiple redirects the services would still face the latency of redirects.


> The server might not know the long URL since all that t.co knows about is the slate.me URL

The server at t.co could send a request (HEAD works) to slate.me, and follow up any redirects it gets to resolve the final URL. (This could be done just by following until no more redirects, or only sending requests to known URL shorteners -- there's advantages and disadvantages to both) -- and you don't need any new HTTP verbs to do it.


That assumes that every user gets the same "long" URL for a particular "short" URL (and that every 30x corresponds to a short-to-long redirect). It falls down where a URL depends on geolocation or time sensitivity.


The alternative I presented of only follow redirects from known URL shorteners addresses pretty much all of that.


The URL should be under the full control of the domain.

1.) The browser can offer the ability to (right click) and shorten a URL or lengthen it. A HTTP standard would provide this mechanism.

3.) The would not require multiple redirects because everyone should ask the domain. If the URL is already shortened then there is not need to shorten again. - service like bit.ly, goo.gl can provides services to: 1.) Actually shorten, statistics...


People are using shorteners on shortened links, this is the problem.

The most obvious one is Twitter, always using it's own service regardless.


My guess would be for analytics, so it knows how its own service is being used and who is accessing websites through it. It comes with a convenient feature that bad URLs can be taken down on its site.


Genuinely curious: How does it take down bad URLs? And, by "bad", are we talking about 404s?


"Bad" as in malware, phishing, etc. When they recognize that a URL is bad, they can simply stop redirecting to that URL.


This is classic "Tragedy of the commons" behavior where each individual group with a link shortener is benefited by encouraging and enforcing its usage (ability to kill malicious links easily, user tracking, etc)

I'm not sure if this can be resolved until users are educated sufficiently on the long-term adverse effects of link shortening services (link rot, privacy concerns, slow/broken redirects, etc).

For change to happen the demand for direct links (generated explicitly by things like this blog posts, or implicitly by higher bounce rates due to long loading times) will need to be enough to outweigh the benefits to organizations that are building them.

Edit:

Even if there is evidence that shows this, why should _I_ be the one to give up my link shortener service when it will have no significant improvement to the overall problem which involves tens or hundreds of these services?


Twitter or other "end points" of content could simply follow the URLs being posted to the end, then strip out intermediate redirects.

It wouldn't solve it completely, but it'd kill the 7 redirects thing.


This is propagated by people not really understanding URLs and blindly reposting links that have already been wrapped in a URL shortener through services that wrap them in another one. Whenever I repost links, I repost only the URL of the final page, stripping off anything unnecessary. Sadly, the trend of browsers hiding URLs or pieces of them is not helping the situation either.

I don't think this can be solved technologically - HTTP redirects are not difficult to detect but a lot of these shorteners (and becoming increasingly more common) use Javascript and/or meta tags to accomplish redirection. The solution is better educated users that don't create chains of shortened URLs.


Could a URL wrapper service follow a URL through its redirects only wrap the final address?

I'm not a networking expert, but it seems viable enough to me. Shoot out a GET request, wrap the final address with your shortener. Cut out the middlemen.

It's an idea. It might fail at scale. And might not be feasible.


Downside is that if I use an URL shortening service that allows me to change where you are headed after the fact (for example after 1000 hits, go to a new page instead) then you've just broken that functionality.


That's no longer a URL shortening service, that's a campaign redirection service. Just serve the appropriate content, rather than relying on redirects.


Except that my actual URL is much longer than the short one that I am using on Twitter, advertisements, T-Shirts and various other sources.

Directly serving the content while possible is not what my client wants. They want to go from their short URL to the longer full domain one.


A lot of shorteners don't just use regular HTTP redirects, they use Javascript or meta tags. To find the actual destination, the server would at the very least have to detect and parse HTML, and maybe even execute Javascript.


Yes, you do a HEAD request, and follow the redirects till you stop getting redirected.

We do this to catch certain spam urls.


I always figured trib.al and bit.ly and their ilk offered different analytics or whatever and that that's why some URLs would bounce through both. I see this especially in major journalism outlets.


The user experience on mobile with multiple url-shortener redirects is beyond annoying. Every new HTTP connection opened on over a marginal cell or wifi connection can stall or fail, even when the actual destination site is up and reachable.


Is the ridiculously long ThisURLShortenerSituationIsOfficiallyOutOfControl.aspx url part of the rant against short urls?


I'm no SEO guru, but isn't the recommended behavior to create a URL that matches the title of the blog post? I've seen these "post title" URLs with increasing frequency over the past few years.


Yeah, at least at one time, and likely still the case.

I never liked this SEO "feature", however. My thinking is why should search engines really care about the URL WRT content? Seems like a shortcoming of the engines, as well as a potential technique for gaming the search engines. In fact, it seems that if search engines could determine that URLs were being used to game them, then they wouldn't need sites to bother with this behavior in the first place. OTOH, if search engines cannot tell they are being gamed with URLs, then it's also completely useless.

What am I missing?


Yeah, but you're supposed to put underscores or hyphens to separate words. The bots aren't smart enough to un-concatenate words as far as I understand.


No, the blog engine was written in 2003 before the lowercase URL with hyphens thing started.


I'm impressed by his proposed solution: http://uniformresourcelocatorelongator.com/


> Every redirect is a one more point of failure, one more domain that can rot, one more server that can go down, one more layer between me and the content.

These are all good reasons, but are there any real users who are actually being affected by these issues? If it is just a theoretical concern, then I don't think it is reasonable to call the situation "officially out of control".


Seven redirects to different domains means seven new TCP connections being established, very likely over a crappy mobile connection (see twitter usage numbers from mobile). The user experience is definitely being harmed here.


By far, Twitter's t.co is most often the dead hop. So, for Twitter users it is indeed an effect of the practice.


I've lived in the Philippines for awhile, and the big telcom here, PLDT, has terrible DNS. t.co links are the most obvious point of contention, where they just won't resolve 90% of the time. It's incredibly obnoxious, especially on a mobile device where DNS settings aren't (easily) exposed.


A little off topic, but I seem to recall seeing, probably some years ago, a post on HN about someone a reversable url shortening algorithm that could convert from the shortened url back to the original. Can't find it now, anyone recall this, or did I dream it?


curl -I -L <url> will follow redirects

There's also this service that expands shortened urls: http://longurl.org/


In a few years, all these services will be gone and all the links broken.


I get the link-rot concern (and 7 re-directs as showcased in the FA is absurd) but these are services are mostly used on twitter and social media where the life-span of a post sharing a link is hours to a day or so, at most.


I couldn't find anything that would output something similar to the redirects image shown in this post, so wrote a small script in node to do that. It looks like this: http://cl.ly/image/3T3e462G1C3d

Here's the script: https://gist.github.com/akenn/7ca7e99a51c3a4abc049

Speaking of, what software did this guy use? Is there a bash script that's better than what I wrote?


You can just use Chrome's network tab and check the option to persist when navigating.


Here is a simple script that uses curl/awk to print the same sort of information:

https://gist.github.com/bertjwregeer/12ae691e5c285f334a36

No need to use Node.


That's exactly what I was looking for, thanks!


http://urlte.am/ was a URL shortener rescue project but it appears to have stalled.


8 redirects is pretty bad but it is not much worst then loading files from 15 different domains (which is very popular nowadays)


Wait. Why is that bad?


Because you are being served content from other people's servers. If one of those servers fails, the whole site you are trying to load fails.

Like hotlinking images from someone else. They might block you, or they might just serve your goatse.jpg instead.


Eh, I don't buy that.

> Because you are being served content from other people's servers. If one of those servers fails, the whole site you are trying to load fails.

Say what?

Many of those things are just tracking pixels. There is no way to serve them from the site host, that's the whole point. And if they fail to load, nbd.

And sure, if some substantial piece of JS or CSS doesn't load, that could cause problems on the page, but in most cases it wouldn't cause total failure of the site loading.

Public CDNs for shared assets makes a lot of sense.

These things are nothing like hotlinking somebodies image.


CDNs are supposed to decrease load times but in my experience it is the opposite... sometimes I wonder if CDNs exist only for data collection purposes.


Closely related, e.g. Photobucket: http://imgur.com/xlWAUdF


I have zero clue what I'm looking at. And I think I'm okay with that.


They are all of the Photobucket web site's third-party javascript dependencies, along with a huge list of external js loaded by just one of those dependencies.


shorteners should be called that way only when the result is actually shorter.

Otherwise "interceptors/tracers" seems a better name.


> What do you think?

I think URL un-shortening should be done in the browser, on URLs that were shortened according to a standard hashing method, so your browser can tell you where the URL will go.

Shortening sevices are ridiculous and dangerous.


>I think URL un-shortening should be done in the browser, on URLs that were shortened according to a standard hashing method

Hashing, by definition, is non-reversible. So if you use a hashing method as your shortening system, you'll have a fingerprint from which the actual destination is unrecoverable.

> Shortening sevices are ridiculous and dangerous.

Right, and there is no reason to use them in any situation where the target is a browser, anyway. It makes sense to use a shortening service if you are going to be sending a URL in an SMS message, but if its going to a browser, there's no reason not to use the full URL. So, really, a server based system that will send message both as SMS and to regular browser users should support a shortener and use it only when sending SMS.


You're asking a 8-10 character string decompress to an arbitrary length string, and have the client be able to do the decompression.

I think if you think about this for a few seconds, you'd find out that if you were able to create such a solution, you've have destroyed some information theory laws.


Yup, I was a technical loon for saying "hash". Nevertheless, I rarely follow a shortened URL (and I see many) because I don't know where they go.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: