The Lazy Man's URL Parsing

yahelc · on May 7, 2012

This is a great trick, but there's an irritating IE bug. On the pathname attribute, IE doesn't add a leading slash on the pathname (whereas all other browsers do).

It can be corrected by doing:

    parser.pathname = parser.pathname.replace(/(^\/?)/,"/");

Further, because you're creating a DOM element, it's less performant than using a RegExp solution. That performance degradation will typically only surface if you're parsing a large number of URLs (for example, in a loop), rather than just 1 or 2.

exogen · on May 7, 2012

For the DOM element, it seems like you could just create the element once and then keep it around for reuse. Should be safe since JavaScript is single-threaded.

zachrose · on May 7, 2012

My understanding is that modifying and querying DOM elements is what's slow, not just creating them.

http://jsperf.com/lazy-url-parsing

yahelc · on May 7, 2012

Added a test case for not caching the DOM element: http://jsperf.com/lazy-url-parsing/2

exogen · on May 7, 2012

I'm actually surprised it's only ~40% slower (in my Chrome anyway). Considering how much less code it is to maintain, that's totally worth it IMO. I'm sure there are a thousand other functions you'd want to optimize in a real app before this.

yahelc · on May 7, 2012

Clever! Didn't realize you could reuse the element.

unfletch · on May 7, 2012

The interface is officially called "URL decomposition IDL attributes." IIRC it's only implemented by <a/>, <area/> and the Location object.

Here are the canonical docs: http://dev.w3.org/html5/spec-LC/urls.html#interfaces-for-url...

And here's a prettier (but less detailed) version: http://developers.whatwg.org/urls.html#interfaces-for-url-ma...

matttthompson · on May 7, 2012

Just came across this clever solution yesterday, actually. Unfortunately, it doesn't work for URLs containing a username and password (e.g. http://username:password@example.com). Glad to have found URI.js, though--it's exactly what I was looking for.

joezimjs · on May 9, 2012

I was noticing that too.

mmahemoff · on May 7, 2012

var link = $('<a/>').attr('href', 'http://example.com)[0] in jQuery, just to show how simple it can be.

A stylistic point is I'd just call it something like "link". Assigning a link element to "parser" is being cute and it's not what the object actually is, even if it's being used with intent to parse.

sparknlaunch12 · on May 7, 2012

HN Link: http://www.joezimjs.com/javascript/the-lazy-mans-url-parsing...

Takes you here: This webpage has a redirect loop The webpage at http://www.joezimjs.com/500.shtml/ has resulted in too many redirects. Clearing your cookies for this site or allowing third-party cookies may fix the problem. If not, it is possibly a server configuration issue and not a problem with your computer. Here are some suggestions: Reload this webpage later. Learn more about this problem. Error 310 (net::ERR_TOO_MANY_REDIRECTS): There were too many redirects.

Instead of here: http://www.joezimjs.com/javascript/the-lazy-mans-url-parsing...

Kevin_Marks · on May 7, 2012

The orignal seems to be down now. Note that there are many ways to name the bits you parse a URL into: http://tantek.com/2011/238/b1/many-ways-slice-url-name-piece...

bmelton · on May 7, 2012

Even worse, the original seems to be down due to an infinite redirect loop.

joezimjs · on May 9, 2012

Sorry about "the original". It wasn't actually supposed to have that query string on the end. That was for Google Analytics tracking elsewhere and I forgot to remove it when I posted it here.

I'm not sure why it produced an redirect loop because it worked many times for a lot of people. It may have had something to do with my server getting overloaded. I'm on a Hostgator reseller account, so I have limited resources and when HG saw the massive CPU usage from all of you people checking this out, they shut down my site for a bit.

IsaacSchlueter · on May 7, 2012

Incidentally, the Node.js `require('url').parse(str)` method is designed to present the same API, except that it also includes the auth section as an 'auth' member.

foxhop · on May 7, 2012

If you are coding with python instead of javascript check out my complete uri module (miniuri.py): https://bitbucket.org/russellballestrini/miniuri/src/tip/min...

elliottcarlson · on May 7, 2012

Cached copy of the page: http://www.joezimjs.com.nyud.net/javascript/the-lazy-mans-ur...

ww520 · on May 7, 2012

Laziness is the mother of all inventions. Neat trick.

dudus · on May 7, 2012

I wonder if it works in Internet Explorer 6 or 7.

yahelc · on May 8, 2012

IE omits the leading slash from the pathname, but its easily fixable. http://news.ycombinator.com/item?id=3939454

Trezoid · on May 7, 2012

Do people building anything real world care about IE6 any more? Hell, do they really care about 7?

stilist · on May 7, 2012

If you care about enterprise you probably care about IE 7.

mkopinsky · on May 8, 2012

In the hospital where I work, most of the computers got upgraded to IE7 about 6 months ago. There's talk of upgrading to IE8, but that's probably a year off if I had to guess.

The way I see it, about half of my salary is for caring about IE7. IE7 hacks and stupidities occupy no more than 5-10% of my time, but the way I see it, 80% of my work is damn fun and I would do it for free. I am glad that there is the remaining 20% of IE7 hacking, boss-assigned-task-doing, and so on for which I (consider myself to) get paid 5x my actual salary.