Hacker News new | past | comments | ask | show | jobs | submit login
What is up with the insane long Google results URL now? (mergy.org)
118 points by mergy on Jan 17, 2013 | hide | past | favorite | 67 comments



I assume someone at Google reviews and approves each one of these parameters that gets added, and each one of them has a good reason for existing (even if we don't know what it is or believe its a good reason).

As a thought experiment I figured if I worked at Google how would I quantify the expense of a query parameter. Users don't really know or care what URLs look like anymore as long as the page is fast, so apart from measuring the gallons of tears generated on Hacker News, I guess resource usage would be the first logical thing to look at right?

Assuming every search is exactly like the one provided in the linked article, and using some public numbers from 2011 on searches/day, these are the top 5 parameters in terms of bandwidth used just transmitting the key and value to the server.

   gs_l	    42.3 Mbps
   bav	    11.7 Mbps
   bvm	    9.1 Mbps
   fp	    8.7 Mbps
   sclient  6.1 Mbps
For just the top 5 thats almost 78 Mb of inbound data per second. Being sent over end users pipes (which are often limited by upload), hitting my transit and peering links, passing through my routers, hitting my front end load balancers, being turned into GoogleInternalProtocolX and fanned out to dozens of servers inside the datacenter over switches and internal routers, and being logged on durable storage for 18 months (assuming they dump this extra data when they strip personal information from the search record). Wow.

A general rule of thumb would be for every 20 bytes you add to URLs by way of keys or values, you've increased the overhead of almost every part of the Google infrastructure by 1 MB per second. (Note I did switch from bits to bytes in the final conversion, and this number is sourced from shaky data to start with)


I suspect they care a lot more about the effect these additional upstream bytes have on response time than their own network ingress. Especially for a site that cares so much about performance, an extra handful of bytes on a slow connection can noticeably hurt latency. Lots of small parameters can really add up.


Maybe I missed it, but where are you getting this data from? It looks quite interesting, and I'd love to see more. Would you be willing/able to share?


I googled for "google searches per day", kept clicking around until I found a few sites that agreed on a number for a recent year. Counted the number of bytes for each URL parameter, then just did a bunch of back of the envelope math. None of it is scientific or probably even close to the actual numbers. Just a thought experiment I did that I thought might be interesting enough for a comment.


You were correct, but maybe add some more details and links to your sources?


Google's search URLs have slowly been creeping up in length over years. Some of the parameters like hl are for user interface control (in this case, langauge). Others like sourceid are for broad tracking of who's using Google how. And lately there are many more nonces mostly related to tracking individual users, Google+, instant search, etc. Many of those change based on who you are logged in as.

Google is also responsible for all the utm_source spam in Feedburner and other tracked URLs. And they are the ones behind the #! / _escaped_fragment_ nonsense. Google's first search product worked so well mostly because it relied on standard URLs as unique pointers to web pages. It's a shame they're breaking URLs in so many ways now.


I don't have a problem with the _escaped_fragment_ stuff, since no human is supposed to ever see those. I utilize those on my single-page JS website and it is fantastic. Google can index my site while I can serve blazingly fast pages to my users.

If a human ever sees an _escaped_fragment_ page...something terribly wrong has happened.


you don't like it when you search, but you like it when you track your campaigns and stats.


Speculation: way back in the day if you Googled for [software marketing service], liked what you saw, and had me Google for it, I'd get substantially the same results. That has been increasingly untrue for years, due to geotargeted results, personalization, persistence across searches, search refinement, their datacenter model, yadda yadda. If users expect copy/pasting a link to a search so that someone can repro it to actually work, then that search query now has to carry more state than just "What I typed into the Internet before I clicked the Googles" (totally baffling to users, who think this is what selects their results).


but what the hell is the point of that if the url redirects to another url? do they really want to track people copy-pasting urls into email/ims? if so, why not use goo.gl?


What redirect? And, did you even read the comment that you're replying to?

1. There isn't a redirect from that huge link. 2. The post replying to just explained a theory that doesn't involve tracking: the URL needs to be that long to be unique enough to specify a page with the exact same search results.


Perform the following experiment:

1. Visit https://www.google.com/search?q=hackernews

2. Mouse over first result, note the url news.ycombinator.com shown in your browser's status bar.

3. Right-click the first result and 'copy link location'

4. Check what's on your clipboard. For me it's www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDIQFjAA&url=http%3A%2F%2Fnews.ycombinator.com%2F&ei=fRb5UNSeD-am0AW88YGQBg&usg=AFQjCNGKJHXhsq1s0-gYR96B--m47G9oRw&bvm=bv.41248874,d.d2k

5. Visit said URL in your browser and note you get redirected to news.ycombinator.com

As you can see, you are redirected via a huge link.


Yes, but that is not the url type being talked about by patio11 or the original article. For the url type under discussion here:

1. Visit https://www.google.com/

2. type "hackernews" into the search box and hit enter.

3. look at the long url in your address bar.


As janzer said, that's not the URL patio11 is talking about, however this redirect is here for a good reason too. It's needed to preserve the search terms in the referrer header for links that are triggered by JavaScript - which is needed if webmasters want to see which search terms are driving traffic to their sites.


I presume it's also handy for tracking click-throughs.



But I really want it to stop. I hate not being able to just "copy link address" off of the results page


This is really insane. The only important params from my point of view are (apart from query) hl (language -- important if you use google.com outside of US to have results in English or any other lang), page number, and safe (for filtering).

The solutions that could be applied on Google side for this:

1. Perhaps they could add some button like on YouTube and other pages to share the search (be it web search, image search etc.). Of course it will clutter the UI but it's already cluttered and non-intuitive, especially given their experiments (AB testing?). Each few weeks I see something moved, changed, colors tweaked. Think what if Microsoft were doing sth like this with Windows or Office... :)

2. Use the History API to decrappify the URL after the load (it's a kind of cheat, but I would like it).


It'd be really nice to be able to right-click and copy-paste clean URL.

In some cases people don't need the URL in the browser, and don't need to open the actual page (or pdf document). Sometimes one need the URL itself - to send it via e-mail, copy into a command line, or anywhere else.

And in all of these cases it is a pure time waste - going to website (or starting PDF download), or cleaning up the URL, because google deceptively substituted it on a right-click. Or monkeying around with scripts that would prevent Google search from tracking clicks all-together.

And one more thing. Just like the original poster says: people hate not being able to just copy the URL. And every time somebody have to waste time cleaning it up, a little bit of hate towards Google sneaks in.



using a greasemonkey script like http://userscripts.org/scripts/show/121261 can fix this for you.. obviously using such scripts brings it's own issues, but a nice(ish) solution imo


A simpler script that just removes the redirection from the search result links can be found here: http://userscripts.org/scripts/show/134151


I started using Google/Yandex search link fix Fx addon after not being able to copy the address of a .pdf file I googled for.

[1] https://addons.mozilla.org/en-US/firefox/addon/google-search...


I search Google via DDG, which gives you a nice (albeit, initial) url. https://www.google.com.au/search?hl=en&q=hacker%20news


This is Google keeping search keywords for their paying customers (changing referer), among other things. Honestly I wouldn't care so much if it weren't so dang slow at random times!

Firefox addon (useful many places):

https://addons.mozilla.org/en-us/firefox/addon/redirect-remo...

Chrome:

https://chrome.google.com/webstore/detail/undirect/dohbiijnj...

Edit: This comment is relevant to search results, not the address bar, my bad!


Thanks for linking to that one, the redirect thing is very annoying (and time-consuming, relatively speaking, which for a speed-obsessed party like google isn't a good thing or very representative)


The turnaround time in returning results has definitely increased.


Sucky side effect - Safari's browsing history just lists a whole bunch of really long google URLs. If I'm looking for something particular I have to try every link in the list.

Edit: See http://bartkowalski.com/2012/02/google-urls-in-safari-browse... for an example.


Well the original post was about the results pages. These urls are the redirects when you click on one of the results.

This seems like a bug in Safari though and not an issue with Google's URLs although I admit they are not too pretty.

An alternative to trying every link would be to try a different browser.


It's entirely due to Google's client-side redirects. Which are stupid and annoying and Google should fix them.

But since Google doesn't care, the Detox extension works nicely. Though it's always somewhat sad when you have to resort to browser extensions to work around web programmers' stupidity.

EDIT: Also, I just tried your suggestion of a different browser, and tried Google's own browser. The stupid titleless redirection URLs are still there, cluttering up your history. So that doesn't really fix it.


Oh my god thank you--I couldn't figure out what was going on w/the browsing history and it was driving me insane. Yeah that's... not a fun problem to deal with.


Easy to work around, no plugins or silliness require.

Get to Google via Duck Duck Go as your intermediary. Either from your search bar (if DDG is your default search engine), or directly from the DDG search page, search for

!g testing

!g tells DDG to send the query to google, and you land on the google results page (not DDG), at the following URL:

https://encrypted.google.com/search?hl=en&q=testing

Even if you prefer google to DDG, you should make DDG your default search, and get to google via !g.

!gi searches google images. !am searches amazon. !imdb searches imdb, etc.

https://duckduckgo.com/bang.html


FYI, the plugins are to remove the redirects Google splices into the actual search results in an on-click handler. So an upvote for you for actually reading the article!

It's certainly nice that DDG knows the minimum required to do a google search, and 'ddg.gg !g testing' saves ~5 keystrokes on 'googl.com/#q=testing', though you do have to wait for the redirects/form submissions.


I am really interested in this, any tips for automation on chrome so I can use the omni bar and don't have to type !g ?


To improve on nivla's answer, you can right click on almost any input field and select "Add to search bar". This will build the appropriate URL, based on the text field's name attribute and the action of the form which posts the field.

So if I had a form on "example.com" which had an action of "/do-search", and finally, a text input with a name of "query", then the URL it will add is:

http://example.com/do-search?query=%s

Saves you having to inspect the URL yourself to figure out the parameters.

Then you can go to the settings page nivla mentioned and make it the default search engine, so that any non-url's you enter into the omnibox are redirected to the %s in the URL above.


Click on the "Zebra Crossing" (that used to be the wrench/spanner :sigh:) icon > Settings > Search > Manage Search Engines

Scroll to the bottom and add "DDGooG" as Name, duckduckgo.com as the keyword and https://duckduckgo.com/?q=!g%20%s as the URL.

Hover over the URL and select "Make Default".

You are all set :)


That's a start, but it's a pretty roundabout way to do it.

You can just use https://encrypted.google.com/search?hl=en&q=%s directly :)


...and make ones for Google Images, Google Maps, Wikipedia, and lots of other sites...


right, but that's different than what the GP to my post was asking, which was how to automate a regular google search with the clean URL without having to type !g every time.


I just altered the Google query to only send a q= field. My search URIs look like this: https://www.google.nl/search?q=test.


my job involves copy & pasting URLs into mails, IMs, documents, .. all the time, that's why my google chrome omnibar triggered default search URL is i.e.:

  https://www.google.com/search?q=hacker+news&pws=0&hl=en
  (pws - no personalization, hl = language) 

how to: Google Chrome Settings -> Section "Search" Button "Manage Search Engines" -> Overlay "Other Search Engines" -> Scroll Down -> Add new search engine with the URL https://www.google.com/search?q=%s&pws=0&hl=en -> "Make default"

sadly custom search engines aren't synced in chrome, so every time you set up a new browser, you need to add your custom search setting again...


This is fantastic advice. Thank you.


It's been like this for a while. A Twitter friend of mine created a simple bookmarklet that you can click which will remove the extra data and allow you to right-click and copy the URL alone. You can drag the bookmarklet from here to your bookmark bar http://techkp.blogspot.co.uk/2012/01/copy-pasting-googles-se...

Click the bookmarklet before copying the link to get the actual URL.


Or you could use this greasemonkey script that doesn't even require you to click on the bookmarklet, and it makes all result links direct, so you can reach them faster:

http://userscripts.org/scripts/show/134151


In chrome I set my default search engine to "Googol" with the search query: http://google.com/search?q=%s. Seemed to fix it for me


[deleted]


You can now use HTTPS on the www subdomain. Try https://www.google.com/search?q=%s as the URL to use. I consistently get this turned into https://www.google.com/search?q=foo&qscrl=1 which makes it easy enough to cut out the qscrl parameter for linking purposes.


[deleted]


For results you click on? Those will be handled independently from the /search?q= URL.


obviously Google's servers will see your request and the results in plain text, or do you expect them to be encrypted even there?


[deleted]


I assumed you were talking about Google tracking you or something. what did you mean with "tracking" then?


but then you have to tell everyone who visits your site to do this...




Click the search button.


Some of the parameters are there to enable certain performance enhancements. For example:

-The home page, after rendering the search box and all that, asynchronously downloads the css, images and html needed to display the chrome around the results listing.

-That way it is cached on the client, and when they perform a query, less data has to be downloaded from Google's servers (just the actual results & ads), making the result page render faster.

-It knows not to download all the chrome due to the presence of the 'fp' GET parameter, the absence of that parameter will cause the entire results page, including chrome, to be downloaded.

I presume the rest of the parameters are useful for similar reasons.

It should also be noted that for non-HTML5 compatible browsers, modifying the hash fragment with JavaScript is the only way to change the url that gets bookmarked without causing a page reload (which would add latency), so if you want a bookmarkable local results page for images with certain preferences, adding a bunch of crap (latitude, longitude, preference hash, query, search type, etc.) to the fragment is the only way.


CoralCDN link:

http://mergy.org.nyud.net/2013/01/what-is-up-with-the-insane...

I couldn't get through to the main site. Fortunately someone had fetched the page over coralcdn at some earlier point. I wonder if someone wrote a proactive coralcdn bot for HN links.


Throwing some more bandwidth on it now. Thanks for the mirror link.


One side effect of these is that it's hard to figure out what the minimal parameters are needed to search when you're trying to create a keyword search (or similar task). E.g., I can type "gna harry carey" in my address bar to search in the Google News Archive, but they've broken my URLs several times and sometimes the fix was far from intuitive. (Right now, for gna I use "https://www.google.com/search?hl=en&gl=us&tbm=nws... )

Since we're on the topic of Google searches, have other people noticed that advanced searches have become quite a bit worse over the last couple of years? The basic search is often quite astounding in how it gets what you want at the top of the results, but using advanced operators often produces quite strange results.

This is even more obvious in the non-web searches, such as Google News, Google Groups, patents, and Google Books. There you can see a result, but then if you add a restriction to the search, such as date range, or "group:" or "ininventor:", it won't find any results--even the ones you just saw which match new criteria.


If you want to share a Google search, you can always use the short and sweet: google.com/#q=keywords


I don't believe this is a new development.


I think it used to happen only if your User-Agent parsed a few specific ways. I did just recently notice the broken "Copy Link Location" in places where it wasn't before.

All the more reason to stick with DDG.


No, but it is a sucky one.


It's essentially used by Google to see which search result you clicked on from the original query. Twitter does similar thing with t.co. It never used to happen on a non-logged in Google account, but now it does. It's been like that for at least half a year. The Greasemonkey script is a great work-around, especially if you value your privacy. Or alternatively why not try DuckDuckGo?


Does someone know a working Chrome extension / userscript that will disable the Google search results redirects?



Not sure if this is related but I started using this goo.gl extension and it has been great to paste links in emails. I use it all day long especially for Google docs links.

http://goo.gl/Ya1kC


In Chrome, you can lessen the pain by using the "goo.gl" (or similar) extension.


Google: slowing fucking up a good thing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: