Google search takes 7 seconds on certain queries

thrwawy3493 · on Aug 4, 2019

I have that beat, as I once made Google perform a 30-second query. I know exactly how I did it, too. I wasn't sure whether or not it would find it...

So how does Google do a query? Well fundamentally it has lists of sites that have a keyword, so it gets the list for each of your keywords, then finds the intersection of those lists, then sorts it by the pagerank of the resulting URL's. That's how it used to work anyway.

So back in the day I decided to try to do a test. I got the Gutenberg list of all of Shakespeare's words, and I wrote a program to go through it to find the longest string of all "stop" words (which you could still force Google to include - normally it ignores them, since there are just too many sites that include words like "the" and so on). Stop words are super common words like "the" which would be on millions or billions of pages. A huge list of URL's.

So I coded up my little algorithm that goes word by word through the text, keeping track of what the longest string it found so far is, that consisted entirely of stop words.

The longest string it found was: "from what it is to a".

Just now I again did that query, and on the whole Internet it just found 4 matches (soon it will find this comment too and any archives of this comment), all being the exact phrase from Shakespeare:

https://imgur.com/a/lvU1QSs

Impressively, that query now takes 0.84 seconds (I haven't done that query in several years, it's possibly cached but I doubt it.)

However, when I first performed it, it took 30+ seconds. I didn't take a screenshot but I was super impressed. I brought Google to a crawl for 30 seconds, in the exact way I was intending. Moohahahahaha.

"Holy cow. I just made Google's databases join six lists each with millions of pages on them, find the intersection, and then go through all of them for which ones had my phrase in literal order. And then it found it."

Pretty mind-blowing that today it can do that in < 1 second.

thrwawy3493 · on Aug 4, 2019

The key for people who don't get why it was so much work for Google is this: where I just wrote "then go through all of them for which ones had my phrase in literal order" I meant "then go through all of the cached contents of every single one of the web pages on the results list -- since the joined list itself would have every web site that has those 6 words, which is pretty much every English-language page on the web of more than a few thousand words. So it neeeded to scan through the cached contents of all of them to find which ones had the phrase in order."

For example I just looked at the top post on Reddit right now, it's about 20 page-downs of comments, and has the words this many times:

   from 17
   what 22
   it 223
   is 169
   to 195
   a - more than 1000

pretty much every English-language web page on the whole Internet will have those words, unless it is very short.

yorwba · on Aug 4, 2019

> So it neeeded to scan through the cached contents of all of them to find which ones had the phrase in order.

If you have a positional index [1] you don't have to go through the full cached content, you only have to check the index to see whether there's an occurrence of the words in the query with the correct distance. E.g. for that Reddit post, you'd check the 17 occurrences of "from" and notice quickly that your query string doesn't match at that position.

[1] https://nlp.stanford.edu/IR-book/html/htmledition/positional...

wereHamster · on Aug 4, 2019

> pretty much every web page on the whole Internet will have those words, unless it is very short.

… because everyone speaks english?!?

England had a large empire back in the days but it wasn't that big, nor did it span the whole world.

treypitt · on Aug 5, 2019

"The sun never sets on the British Empire" it was up there historically with Rome and Alexander's conquests (although the latter may not have constituted a bona fide empire)

thrwawy3493 · on Aug 4, 2019

updated my comment. My apologies.

computator · on Aug 4, 2019

> on the whole Internet it just found 4 matches, all being the exact phrase from Shakespeare

But why just 4 occurrences? Every utterance of Shakespeare has been printed on thousands of web sites and Google has indexed thousands of those sites. Another line from the same play, "Nay, answer me: stand, and unfold yourself", gets 10,300 hits. I tried another line from deep within the play: "I know him well: he is the brooch indeed." Google found 3,890 results. Even though the search is slow, why aren't we getting thousands of hits for "from what it is to a"?

makomk · on Aug 4, 2019

The other odd thing is that if you search for the exact phrase "transform honesty from what it is to a bawd", which is what that particular all-stopword string is taken from, you get a lot more results. Clearly Google isn't actually finding most of the pages in its index that match that string.

nwellnhof · on Aug 4, 2019

> why aren't we getting thousands of hits for "from what it is to a"?

Because Google isn't an exact search engine. Each of the terms in the phrase above appears on billions of pages. My guess is that for common terms Google doesn't store all postings in its inverted index but truncates the posting lists after a couple of million entries.

lol768 · on Aug 4, 2019

If you're interested in slow queries, try the one below (via [1]):

    the OR google OR a OR "supercalifragilisticexpialidocious" -the -google -a

You can replace the "supercalifragilisticexpialidocious" term with something unique if the results end up cached. This takes upwards of 5 seconds to return back.

[1] https://www.quora.com/What-is-the-slowest-Google-query

amelius · on Aug 4, 2019

One question is, will such complicated queries bring all of Google to a crawl, or will they e.g. be executed with lower priority or on different servers, leaving the rest of the queries unaffected?

pcnix · on Aug 4, 2019

Of course they don't affect other queries, these would be non-locking queries on replicas. Or, to be even more accurate, they'd be non blocking queries on indices cached in memory on edge servers and probably several levels of optimization beyond that. There's no reason for these type of queries to lock a database.

amelius · on Aug 4, 2019

I wasn't concerned about locking, but more about resource utilization.

Searches are read-only, so I suppose locking doesn't need to happen anyway.

needs · on Aug 4, 2019

Or just search for "powered by".

amelius · on Aug 4, 2019

Perhaps it also keeps an index of pairs of adjacent words.

Another thing is that your query is essentially a literal "substring" search, which Google probably handles differently. See e.g. the Burrows Wheeler transform for an idea how it could be implemented.

dekhn · on Aug 4, 2019

If you want to know how Google search got fast, the book "Managing Gigabytes" covers many of the basic optimizations to speed up search. I found the book very enjoyable to read.

lazyjones · on Aug 4, 2019

> Pretty mind-blowing that today it can do that in < 1 second.

Today they probably use common pairs of words as well for their lists, so they have only 3-4 intersections of smaller lists in this case.

kburman · on Aug 4, 2019

Just did that same search on bing and it also returned 3 results within a second.

solipsism · on Aug 4, 2019

when I first performed it, it took 30+ seconds

Pretty mind-blowing that today it can do that in < 1 second

You are reading a lot into what sounds like it was a single measurement. Was this experiment repeatable,?

867-5309 · on Aug 4, 2019

due to caching, probably not

bubble_talk · on Aug 4, 2019

For everyone who is suggesting Google search quality has taken a hit, I would suggest a quick test to see for yourself that Google is still WAY better than the competitors when it comes to long tail queries.

Example: someone compiled a list of things they learned from indiehackers.com interviews.

http://www.toomas.net/2017/07/18/reverse-engineering-a-succe...

There are a lot of quotes on that page, and unfortunately none of them are linked back to the original interview. Some quotes are very interesting, so I wanted to find the original interview.

I took some of those quotes, put them in double quotation marks, and searched on Google, DuckDuckGo and Bing. By the way, you can only replicate these results by adding the double quotation marks.

Results:

Google always shows toomas.net in the top results, and almost always finds the relevant interview article on the indiehackers website.

DDG (usually) finds the article written on toomas.net, but not the indiehackers interview.

Bing often fails to list toomas as the top result, and doesn't find the indiehackers interview at all.

ignitionmonkey · on Aug 4, 2019

It's worth mentioning that indiehackers.com is a JavaScript-only website and doesn't work with JS turned off. Google's spider can execute JS, while others do not. This is why indiehackers.com doesn't show up, not because of the search engines themselves.

I have a similar issue with a side project of mine. JS-only websites solve this with server-side rendering or static generation.

taf2 · on Aug 4, 2019

When I think of a search engine I think of it as two things.

1. A web crawler 2. A search index

Or as a good boss used to say garbage in garbage out... the quality of the engine is as much a function of the index’s ability to rank as the quality of the input from the crawler... the fact that none of the other engines crawl with JS enabled is a huge competitive advantage for google

missblit · on Aug 4, 2019

Bing also renders JavaScript [1][2]. Though there's not a lot of information out there about the details like what browser engine they use.

[1] https://blogs.bing.com/webmaster/october-2018/bingbot-Series... [2] https://searchengineland.com/bing-crawling-indexing-and-rend...

hobs · on Aug 4, 2019

Well sure, but you could also just make a website with a modicum of thought and then all search engines will have no problem.

Anything that loads no resources unless I have javascript enabled (and is a general purpose website, not some 3d toy or whatever) is trash.

damnyou · on Aug 4, 2019

I mean, clearly this site has value to the OP so it is obviously not trash.

What value a service provides is what matters in the end. Implementation details are secondary.

yjftsjthsd-h · on Aug 4, 2019

Pulling gems out of trash does not make it not trash.

hedora · on Aug 4, 2019

> Google's spider can execute JS, while others do not.

I’m honestly having a hard time seeing how this isn’t a bug in Google search.

jefftk · on Aug 4, 2019

1 point by jefftk 0 minutes ago | edit | delete [-]

A search engine is trying to predict which pages will be helpful to you in response to your query. This means it should take "how does the page look to users" as input, which today means executing JavaScript.

(Disclosure: I work for Google, not on search)

dredmorbius · on Aug 5, 2019

Contrapoint: if you keep enabling hostile behaviour, you'll get more of it.

Non-JS web indexing as a default would vastly dimnish the utility of JS-only pages.

damnyou · on Aug 4, 2019

It is a feature, not a bug, because it provides value to customers.

I separately think sites that only work with JS enabled are not great, but that's a problem with the system and not with individual actors.

toast0 · on Aug 5, 2019

On the other hand, in the same way that Google is (claiming to be) trying to make the web fast with AMP, they could make the web fast by refusing to index pages that don't provide a non-javascript experience. (Yes, you can make a slow page without Javascript, but it sure is easier with it)

phkahler · on Aug 4, 2019

>> Google's spider can execute JS, while others do not.

So Google is enabling the excess of JS we see today? Sites wouldn't do that if it couldn't be indexed by Google.

eropple · on Aug 4, 2019

Yes, they would. They'd just use NextJS or similar to push down a static skeleton that minimally covers the crawler. (My current company has to provide data to the Facebook crawler and that's what we do.)

Nothing prevents a site from doing that and just loading JS afterwards. Or from still being basically read-only without JS. Which might be an improvement...except within epsilon of zero people care in the first place and Google is building software for a rather larger proportion of the market than that, so it's an improvement for what's barely an audience.

Nobody is really "enabling" it except for a browser. And that's not changing. The shirt-rending is becoming tiresome.

stonogo · on Aug 4, 2019

I'd be curious to know why your company specifically targets large-market crawlers like that, instead of just making things crawlable. It feels like leaving money on the table when you overfit your stack to one market.

It's a shame you find it "tiresome" when people insist on retaining their own opinions, but your exhaustion is not particularly relevant.

eropple · on Aug 4, 2019

We need to build shareable landing pages. We share via Instagram and Facebook (for now). We use NextJS because they do provide static rendering that covers what we care about and are likely to care about anytime soon.

And you can have whatever opinion you want; nobody’s saying otherwise. But the constant whining about nothing — and it is nothing — is exhausting.

beatgammit · on Aug 4, 2019

Yet another reason to support other search engines. I haven't used Google in years, and I haven't had any real problems.

stock_toaster · on Aug 4, 2019

> For everyone who is suggesting Google search quality has taken a hit, I would suggest a quick test to see for yourself that Google is still WAY better than the competitors when it comes to long tail queries.

I think these are both true. Google quality has certainly fallen in my opinion (perhaps as a result of having to constantly counter SEO tricks). Other search engines still have quite a bit of catching up to do as well.

iamnothere · on Aug 4, 2019

Depending on your searches, YMMV. I finally switched to DDG at work because Google refused to show me relevant results for some (admittedly obscure) searches related to lensing in F#, preferring instead to give me political opinion pieces(!) that were completely irrelevant to my query, along with various results about video games. DDG had no such problems.

This is one example of many, but it was the last straw for me. I guess techies aren't Google's core audience anymore, I get that, but it feels like we've lost a valuable tool.

hedora · on Aug 4, 2019

For the last decade, the competitors have been meeting or beating Google search relevance in test where the branding is removed.

People that can actually notice a major difference in result quality should probably go work on search optimization.

unicornporn · on Aug 4, 2019

> DDG (usually) finds the article written on toomas.net, but not the indiehackers interview. Bing often fails to list toomas as the top result, and doesn't find the indiehackers interview at all.

Are you able to explain this? Bing and DuckDuckGo are the same thing. DDG uses Bing's API (for non bang searches).

bubble_talk · on Aug 4, 2019

I cannot explain it, hopefully someone from DDG actually chimes in :-) If they just use the Bing API, how can their results be better than Bing? I suppose they also use other sources.

Mathnerd314 · on Aug 4, 2019

https://help.duckduckgo.com/duckduckgo-help-pages/results/so... says "a variety of partners, including Oath (formerly Yahoo) and Bing."

judge2020 · on Aug 4, 2019

I'm also seeing crawls from duckduckgobot so they're working on their own system at the moment. It may already be deployed for some easy queries.

unicornporn · on Aug 4, 2019

I find their results are usually as crap as Bing, especially when you use a localized version. That's why I use StartPage (uses Google).

stonogo · on Aug 4, 2019

> Bing and DuckDuckGo are the same thing. DDG uses Bing's API (for non bang searches).

This is not, and has never been, true. Why do people like you keep saying it?

carlob · on Aug 4, 2019

As far as I know duckduckgo used being for images and some other stuff, but not for the main text search.

ses1984 · on Aug 4, 2019

Google being better then competitors on this one query tells nothing, especially regarding Google's performance compared to its past self.

propogandist · on Aug 4, 2019

Slight tanget, but it might be interesting to some on search results and how Google's search results can sway users.

Googles been able to use ephemeral experiences (like autocomplete and answer boxes) to influence users, especially undecided voters. The research, which was reproduced by a German team, is believed to have influenced 2+ million people in the US election alone --

take a listen on the congressional testimony http://naplay.it/1157/37:25 (recommended listen speed is 1.5X)

joshuamorton · on Aug 4, 2019

That testimony is highly misleading. Taking the highest estimates possible, which are based on essentially a conspiracy theory, the number of influenced voters is much lower than 2 million.

He claims they moved 2.5 million votes, not voters. This relies on an assumption of straight ticket voters and about 20 votes per voter. He doesn't correct people when they make the votes/voters mistake.

And even that new number, 100k voters swayed is not really well founded. He makes a couple of overlapping claims, one of them I looked into deeply and found that using his own papers, a reasonable way of expressing it was that 8-10k additional votes for Clinton was a reasonable upper bound, as he claimed, might be affected.

And the number was likely lower (and was only that high because more people identify as democrat than republican in the US).

(I work at Google, but take interest in this mostly because it's just terrible abuse of mathematics)

the_mitsuhiko · on Aug 4, 2019

> For everyone who is suggesting Google search quality has taken a hit, I would suggest a quick test to see for yourself that Google is still WAY better than the competitors when it comes to long tail queries.

Maybe but it's pretty clear to me that Google now no longer indexes entire websites. This as far as I can tell has not been a concern in the past.

syshum · on Aug 4, 2019

Saying other search providers are worse does not refute the statement that Google Search has been reduced in quality.

The reduction in Google Search quality is their need to be "helpful" with their desire to use my past search results, and other "big data" they gobble up about me to "improve" my specific results when in fact it makes it worse. That is just the base line, then you have the 100's of other things they have done to the search over the years for political, regulatory and other reasons.

Google search gets objectively worse every year than the previous year.

fauigerzigerk · on Aug 4, 2019

I agree that personalization is often not helpful, but I think the more important reason for Google getting worse is that the web is getting worse.

bitL · on Aug 4, 2019

That doesn't explain why DDG returns more useful results for many development-related queries.

fauigerzigerk · on Aug 4, 2019

I get the feeling that DDG specializes in dev related content a little bit.

Demigod33 · on Aug 4, 2019

Is there a bang to disable personalized results?

lol768 · on Aug 4, 2019

Bing seems to simply ignore the quotes in your query after the first result (for query "This idea that you need to water down your pricing or offer a free beta period is bogus") and then just displays random results. Even SoundCloud is listed there..

Mistletoe · on Aug 4, 2019

Is saying you are better than Bing and DuckDuckGo saying a lot?

bubble_talk · on Aug 4, 2019

OK, are you suggesting there is a search engine which produces better results for this specific case?

kuschku · on Aug 4, 2019

The version of Google they had in 2010?

bryanrasmussen · on Aug 4, 2019

would google have returned results from a JS rendered site back then? The ability to index client side JS rendering puts google ahead of their competition, as more sites become JS rendered they support google's monopoly.

kuschku · on Aug 4, 2019

Improving one minor feature can’t fix something where the basic functionality has been broken.

That’s like Yahoo designing a fancier logo while the company falls apart.

jsjohnst · on Aug 4, 2019

More like 2003-2005 for me. By 2007, the quality was sliding due to them targeting mainstream folks, and deprioriting the tech fridge stuff that I care about.

jmpeax · on Aug 4, 2019

Great, now I'm on a list: https://i.imgur.com/I8FM4aN.jpg

fluffything · on Aug 4, 2019

Did you report the query? Or why can't you show it?

alienallys · on Aug 4, 2019

Query was "area 51 goto DevTools and edit"?

jmpeax · on Aug 6, 2019

Exactly. My screenshot was a tongue-in-cheek comment on the black bars in the original post. Unless we can reproduce/test it ourselves, then it's as good as a devtools edit.

fxtentacle · on Aug 4, 2019

218 seconds? That's apparently a new record.

d-sc · on Aug 4, 2019

Tdlr: Googling “Powered by” seems to cause google to take excessive time to return results.

xurukefi · on Aug 4, 2019

2.64 seconds for me (i guess it's cached now?).

Honestly, the mentality in this twitter thread is a joke. Google handles insane amounts of data beyond what 99.99% of engineers have ever dealt with. The fact that their search works at all is a miracle. But I guess people like to ignore that and much rather complain on Twitter that it takes a few seconds to query a database of trillions of records.

Liron · on Aug 4, 2019

Google is famous for raising the bar on speed and yeah, their performance is usually impeccable. That’s why the existence of a consistently slow Google search is newsworthy.

ehsankia · on Aug 4, 2019

It very clearly has to be a bug. I also got ~2.5s. It makes no sense for it to take that long for everyone, even after running it multiple times in a row. The system most definitely has many types of caching, so if it takes that long, there's probably something going very wrong.

ibejoeb · on Aug 4, 2019

There are comments on the thread that suggest that it is an intentional delay to prevent Google itself from being weaponized. Essentially, "powered by" is special because of automatic inclusions like "powered by wordpress" or "powered by nginx."

ehsankia · on Aug 4, 2019

How can it be weaponized? Can you not search any other keyword used by those platforms? "by wordpress" is pretty fast.

ibejoeb · on Aug 4, 2019

I'm just pointing out the hypothesis and the train of thought. I don't know the real answer here. It's not my hypothesis.

If, for example, you wanted to harvest domain names because you had a hot 0day on wordpress, this would be a convenient way to do it.

Yes, there are probably other signatures, too.

ekc · on Aug 4, 2019

For me:

About 6,060,000,000 results (6.82 seconds)

calcifer · on Aug 4, 2019

Well, thankfully you were here to defend one of the largest corporations on the planet from these Twitter meanies who dare discuss a slow responding page.

xurukefi · on Aug 4, 2019

I don't see any factual objective discussion in the twitter thread. Btw, I'm not a big fan of Google, which I guess is what you were implying. In fact, I think Google is the most dangerous tech company currently in existence and I consider it a severe threat to our society. But that doesn't mean I cannot objectively judge the technical quality of their products and put things into perspective.

calcifer · on Aug 4, 2019

> I don't see any factual objective discussion in the twitter thread.

What do you mean? People are seeing slow queries for "powered by". Unless you think they are lying, I don't see how that isn't factual?

xurukefi · on Aug 4, 2019

It's not what you say, but how you say it. This is just mindless disproportional bashing.

"Come on Google I don't have all day!"

"This is actually one of those crazy "facts from the future" to go back in time and tell people 10 years ago: Google has the dominant browser and mobile operating system, but their search sometimes takes 7 seconds."

"@Google have you tried implementing these recommendations? https://developers.google.com/speed/"

robbrown451 · on Aug 4, 2019

Doesn't seem like bashing to me. It is simply interesting.

calcifer · on Aug 4, 2019

Just seems like perfectly normal snarky banter making fun of a faceless corporation.

Still, you do you. I'm sure Google appreciates the peanut gallery coming to their defense for even the most trivial of issues...

Gigablah · on Aug 4, 2019

I, for one, support the effort to raise the signal-to-noise ratio.

lostlogin · on Aug 4, 2019

It’s really weird. If you add “powered by” to a query it’s really slow, but if you add a minor typo (eg “powere by”) it finds the same results, but faster.

JayXon · on Aug 4, 2019

According to https://twitter.com/syndk8/status/1157930276208750593

> Powered by is a footprint that black hats/hackers and spammers use to find targets.

So maybe Google isn't being slow, it's just sleeping intentionally to avoid abuse?

testplzignore · on Aug 4, 2019

It looks like it is the exact substring that is slow. "asdfpowered byasdf" is slow, too. I bet it's happening at some layer that doesn't even know how to parse the query.

lgeorget · on Aug 4, 2019

Since a lot of pages contain these exact words (WordPress, github.io, etc.), maybe it's actually the sorting of the result that takes time?

_udy5 · on Aug 4, 2019

"amex platinum powered by mastercard" as well takes > 5 seconds

_kbh_ · on Aug 4, 2019

"a powered by b" takes >5 seconds

"x powered by header" takes >5 seconds

"x-powered by header" takes >5 seconds

"x-powered-by header" takes ~0.5/0.3 seconds

what.

tru3_power · on Aug 4, 2019

“powered by” a popular term used in a lot of footers for a lot of different web forums/cmses, I wonder if that has something to do with it?

robjan · on Aug 4, 2019

I had the theory that maybe they are trying to filter it out to frustrate searches for vulnerable CMS

tru3_power · on Aug 4, 2019

Yea, I was thinking along the same lines. You’ll often see publicly disclosed vulns include google dorks to find vulnerable sites, ex: a google dork to find forums running a vulnerable version of phpbb or something

dataflow · on Aug 4, 2019

They're not the same query. They don't return the same results. I seem to recall a dash links words together somehow. So that probably narrows down the search space more.

tus88 · on Aug 4, 2019

> x-powered-by header

Is that a thing?

_kbh_ · on Aug 4, 2019

Yeah, it generally returns some variant of information about the technology that is powering a web application such as ASP.Net or express.

petra · on Aug 4, 2019

The last one may be from cache.

_kbh_ · on Aug 4, 2019

b-cache-busting-powered-by b

~0.4 seconds.

misterman0 · on Aug 4, 2019

Google doesn‘t use stopwords because if they did I wouldn't be able to search accurately for information regarding the group "The The". But because we can find the occasional slow-running query they might be using "stop phrases" but weren't aware "powered by" should be one.

londons_explore · on Aug 5, 2019

There is probably a small special index for things like "the the", and then the regular big index filters out such common words.

louwrentius · on Aug 4, 2019

About 7.780.000.000 results (4,14 seconds)

propogandist · on Aug 4, 2019

I've found that if you go past page 2x on most of these results, Google just stops presenting pages and will tell you that it's omitted the results.

The number of results may represent number of hits in a database, but it's not actually accessible to the end user. After you get into the last few pages, it'll also just cap the result count to the number of pages they've decided to show you.

imdsm · on Aug 4, 2019

I posted about this a while ago, where queries took even longer with certain calculations. -2 * 2 takes ~10 secs.

https://news.ycombinator.com/item?id=17470163

dataflow · on Aug 4, 2019

Yeah, definitely seen Google take a long time. Also seen weirder things. Just a couple days ago I managed to type a query that neither told me there are no results, nor did it actually return any results. Hadn't seen that before.

Sendotsh · on Aug 4, 2019

> Just a couple days ago I managed to type a query that neither told me there are no results, nor did it actually return any results. Hadn't seen that before.

I've had that a few times lately. I thought it was my internet being shoddy but the header and footer load, just no results in the middle nor any error about not finding anything... just blank space.

andromeduck · on Aug 4, 2019

This is probably a big but I wonder what it would return if all the results for GDPR'ed or otherwise legally censored.

ysleepy · on Aug 5, 2019

I have sympathy with the guys watching the metrics seeing a huge spike in outlier response times and go investigating after this made its round.

stareatgoats · on Aug 4, 2019

tldr; querying the term "powered by" causes the query to take an inordinate amount of time (still less than 10 seconds, so a far cry from "all day").

The reason doesn't seem clear, but one comment [0] claims that "powered by" is a common query used by black hats and spammers to find targets. Not sure why any "antispam" behind the scenes would cause the query to delay though. Hardcoded delays?

[0] https://twitter.com/syndk8/status/1157930276208750593

stareatgoats · on Aug 4, 2019

Finally found a comment that makes sense actually [0]:

"Powered by X" is on millions of web pages because it gets autogen'd by popular web-facing CMS (Joomla, Wordpress etc). So when a search query includes "Powered by" google must determine which among the billion pages with this phrase is most relevant"

[0] https://twitter.com/bradcog/status/1158008651619004417

amelius · on Aug 4, 2019

How many CPUs does Google have at its disposal these days? And how many queries are performed per second?

deathwarmedover · on Aug 4, 2019

DuckDuckGo make some of these sorts of numbers public: https://duckduckgo.com/traffic

rightbyte · on Aug 4, 2019

The actual searching is performed at MS and Yahoo, though.

H8crilA · on Aug 4, 2019

A single query can hit 100s of CPUs. There's over 60k Google search queries per second, according to some random source found on the internet. That would require 600k CPUs. If it's 1000 machines that's being hit it would require 6M CPUs. Also, Internet traffic is bursty, there's maybe 2-3x more queries in peak times.

My experience with working at a FAANG: I remember a service that was easily dealing with >10M QPS (albeit much simpler queries than a Google search), or some other services continuously processing (reading/writing to disks) > 100 GB/s.

There's A LOT you don't see in FAANG frontends. In particular, the Twitter thread owner suggesting https://developers.google.com/speed/ has no idea where is the slowness coming from.

That being said I'd love to be a Google employee to investigate those queries. Perfect bug reports.

ec109685 · on Aug 5, 2019

Each query uses much less than a second’s worth of cpu, so you can’t just multiple searches per second times cpu’s involved in query to get at total cpu’s needed.

H8crilA · on Aug 5, 2019

Apparently they don't, look at the posted example :). Seriously though it's probably like half a second or maybe a quarter of a second, that's what Google often reports.

And yes this is an oversimplified model. No idea how much CPU is actually used.

noncoml · on Aug 4, 2019

“hackernews powered by lisp” seems to be the problem

dekhn · on Aug 4, 2019

I've heard that numeric range queries make the backends work harder. Consider adding those to your queries to make them slow.

hedora · on Aug 4, 2019

I tried to switch from duck duck go to google, but the seven second latencies for certain two word queries got me to switch back after a few days.

gambiting · on Aug 4, 2019

Slightly off topic - has anyone else noticed that in the last couple years Google results have just become....crap? Like, as a programmer I'm used to googling all sorts of things, but lately it's almost useless. Like, my recent fail was searching for "<class name> C# programming" and......the entire first page had both the class name and C# crossed out and was showing me generic results for programming. Top result was some website offering courses in programming. What the shit Google. I personally think it's the rise of devices like Google Home that's to blame - Google tries to reduce every search query into something that can yield a short snippet that can be read back to you - so it's very aggressive towards highly technical queries that it used to be so good at.

kristopolous · on Aug 4, 2019

They seemed to have abandoned the technical user fanbase around 2012 and are now just openly hostile towards them.

Especially on mobile, making it challenging to edit URLs on their latest chrome browser. UX consistency has gone out the window for weird things like like swapping around the tab ordering for searches (check out chicago, chicagos and chicago's here: https://pbs.twimg.com/media/Cm0C1o8VYAgLv2O?format=png)

Other things like not providing custom date ranges on mobile search are just utterly baffling. Booleans are gone, ranges are gone, and it ignores most of the words I put in. I honestly think alta vista results circa 1997 were better than what I'm getting these days. In fact, they most certainly were.

They've also abandoned their namesake "googol" ... they don't seem to care about http, newsgroups or many other things. They should probably rename themselves to "Around300orso"

Google ceased providing functional tools for technical people a while ago. There's some decent open space for another firm to come along (like DDG or perhaps the Microsoft Renaissance) and just snatch the technical user, create compelling products for just that, and own it.

I'd even pay probably $250/yr or so for such a no-bullshit resource stack - a high quality searchable answers-oriented quick-to-use, quick-to-read technical reference without a bunch of wrong information or concrete answers buried in pages of theory. Heck, I'd probably consider dropping $2,000, that sounds amazing.

taneq · on Aug 4, 2019

Have you tried just using DDG? I’ve been using it for years and it’s about as good now (for my searches) as Google used to be.

nothis · on Aug 4, 2019

I'm seeing there's some niche cases where it's worse (especially with some extreme quotation-mark-heavy searches) but I've switched a few months ago and never looked back.

The thing is: Most searches aren't even that complicated. Whatever "magic" (read: extra processing power) Google uses is mostly helping for extreme niche cases. The rest they seem to do, nowadays, is "editorializing" results, pushing popular websites before more relevant ones, displaying results from some internal database, etc.

tambre · on Aug 4, 2019

Not the parent, but a stupid reason why I can't use DDG is that it doesn't support IPv6. It's extremely unhelpful when you're doing some IPv6-only testing and then can't search for answers.

As far as I can tell, 6 years ago the reason was that AWS didn't support IPv6[0]. It has for 2.5 years by now though.

[0] https://web.archive.org/web/20180909174057/https://duck.co/f...

kristopolous · on Aug 4, 2019

That really sounds like a 2 hour fix max on their end; just some user facing dns What am I missing?

tambre · on Aug 4, 2019

They seem to use load balancers, so it indeed should be that easy, unless they haven't migrated to EC2 VPC.

I doubt they have any IP-dependent code, since their main selling point is not tracking anything.

lordgrenville · on Aug 4, 2019

Switched to DDG recently and was pleasantly surprised to find how well it excerpts the top StackOverflow answer for technical questions. Just for this I would it's actually a better option for developers, at least for 95% of your queries.

kristopolous · on Aug 4, 2019

I should do it more. Maybe exclusively for a week and see how it goes.

I've actually run into Gabriel a few times online and had a bit of conversation with him. Nice guy. Can't really say I've chatted it up with Eric Schmidt...

iforgotpassword · on Aug 4, 2019

Once I internalized "just add !g if the results suck" using ddg became practical for me. And I don't know if it's me or ddg actually improves over time but I feel like I'm doing it less and less.

Sometimes Google seems to be better at guessing the context especially if you search for C and some other term that has a bunch of different meanings.

taneq · on Aug 4, 2019

Yeah, it's been months since I tried searching Google for something and the last few times I didn't find anything better than what DDG was giving me.

solarkraft · on Aug 4, 2019

!g is wonderfully helpful (but not too common for me).

I find it pretty funny that I automatically add !g when the Google results suck as well.

djflutt3rshy · on Aug 4, 2019

Custom date ranges on mobile work for me, just use the before: and after: operators. For example before:2018-01-05

Lammy · on Aug 4, 2019

The turning point for me was five years ago when they dropped the "Discussions" filter that would search mailing lists and forums. It was amazing for obscure technical and programming issues. Here's an article about it from the time: https://www.seroundtable.com/google-search-filters-gone-1799...

kev009 · on Aug 4, 2019

I am an amateur computer historian, and I can confirm it went from good to shit. For instance, I run an archive at ps-2.kev009.com and there is an oldskool cgi search on my site (Xapian Omega). If you search IBM P/Ns or FRUs on G, versus on my site (for stuff my archive covers), you can see it first hand. My search is exhaustive and it is readily apparent G is not. I don't recall that always being the case, G started to penalize old content, http content, and all kinds of other stuff to the point of oblivion.

This makes me sad, because I will often have very exact queries like a part number and there is content out there on the internet that I cannot locate with search.

Avamander · on Aug 4, 2019

I'd love Google to separate "fresh" and "old" into two different search engines. When using a new framework or a version of a language or when searching for news I don't want old results. However, when searching for a part number, old documentation, information about a person I do want old information. Right now it just seems google purges the old instead of building two indexes.

cameronbrown · on Aug 4, 2019

It's better UX to make it a ranking signal from your query or the tools button.

afandian · on Aug 4, 2019

One of the reasons I go to Google is I'm trying to answer a question where I know the subject area but not the answer to the specific question. So I'm asking for a combination of general concepts and a specific thing.

Entirely fictional example: "hacker news comment bold"

And seemingly every time, the specific thing is crossed out. At that point Google search is worse than useless.

I have a few hypotheses:

1) They are trying to turn it into a general directory in their continued war on DNS.

2) The web has got so big that their technology just isn't good enough anymore.

3) The users they care about are no longer people (such as technical or, per sibling comment, academic) who are trying to find specific information. They prefer to focus on mass-market.

Our strongest defence as consumers is to resist the further slide into monopoly.

iforgotpassword · on Aug 4, 2019

> 3) The users they care about are no longer people (such as technical or, per sibling comment, academic) who are trying to find specific information. They prefer to focus on mass-market.

This is the only explanation that makes sense. The vast majority of people probably enter questions and queries in natural language so a purely keyword based search is out the window. Why a list of stop words wouldn't cover that I don't know, but let's just assume it doesn't. The second point is that most people also probably hardly ever look for precise information but ask ambiguous stuff that's just hard to properly answer with traditional approaches.

So my guess is that at some point they realized that most queries are of that kind, then shortly discussed keeping two search engines in parallel and finally went with "screw those stupid devs, we're freaking Google!"

Mauricio_ · on Aug 5, 2019

Can you put a comment in bold in hacker news?

afandian · on Aug 6, 2019

In the fictitious story, I never found out!

larkeith · on Aug 4, 2019

Don't forget about the massive amnesia Google has - try searching for anything over a few years old.

And gods help you if you are looking for something both old and by keyword, because Google sure won't.

Unfortunately, DDG has a lot of the same issues - which is a shame, as if they just used Google's algorithm of a decade ago I'd switch in a heartbeat.

arpa · on Aug 4, 2019

It has been becoming crap a long time now. In the past few years i just need to include, well, everything in quotes just to get somewhat relevant results instead of very generic ones. For me it began with their removal of the + search operator (to boost google+ organic traffic from google). Also removal of boolean operators. Also very, very aggressive typo fixing (i meant "Brose Wollis" not "Bruce Wills" god damn it). I think that google home is just the latest thing that furthers googles' exclusive catering to the lowest common denominator - the people who use google as DNS and fail to do even that. I guess there's not that much of ad revenue from power users.

DanBC · on Aug 4, 2019

> For me it began with their removal of the + search operator (to boost google+ organic traffic from google).

Google say they removed it because most searches that included a plus operator were using it wrong.

https://search.googleblog.com/2011/11/search-using-your-term...

> In most cases, Google’s algorithms make things better for our users - but in some rare cases, we don’t find what you were looking for. In the past, we provided users with the “+” operator to help you search for specific terms. However, we found that users typed the “+” operator in less than half a percent of all searches, and two thirds of the time, it was used incorrectly. A couple of weeks ago we removed the “+” operator, encouraging the use of the double quotes, which are more likely to be used correctly.

I'd be intersted to know how often the "quotes" are used correctly.

rightbyte · on Aug 4, 2019

0,5% of the queries used "+" so they removed it? Feels like a dogmatic "data driven" UX change for the sake of change. The interesting number would be how much it was used when the first queries failed and gave to much answers.

dTal · on Aug 5, 2019

"We removed the seatbelts because we found they were used in less than 0.5% of car journeys."

DanBC · on Aug 4, 2019

So, I'm not saying I agree with Google here. I'm just providing links to google information about why they said they removed it.

crummy · on Aug 4, 2019

That's my biggest frustration, when I search for "X Y" and they return me results without "Y" at all, and even call that out in the results.

I end up having to put everything in double quotes just to get results that I'd expect.

mort96 · on Aug 4, 2019

"I" "miss" "the" "times" "when" "I" "didn't" "have" "to" "type" "like" "this".

DanBC · on Aug 4, 2019

I also find that annoying. I wonder if it started because Google favour "fresh" content, and so people keep changing the content of their pages and Google keeps that page even when Y has been removed. Here's MattCutts from 10 years ago saying this is sometimes useful:

https://news.ycombinator.com/item?id=902999#904784

> Your "Red Room" query is hard in a couple ways. First, it looks like that root page used to have the words on the page: "The Red Room Doors open 6pm $18 Pre-Booked" And it's also tough because it looks like the name changed to the "2nd Degree Bar & Grill" at some point. The fact that you can type [red room] and get a suggestion for [red room st. lucia] is actually pretty helpful in my book because it leads you to the answer that the name changed.

netsharc · on Aug 4, 2019

Can't believe how dumb it is. Imagine a surgeon asking an assistant for a specific type of scalpel, and that assistant bringing an assortment of scalpels.. "I specifically specified X scalpel, god damn it!".

AmVess · on Aug 4, 2019

I switched to DDG a while ago, and sometimes hit Google, usually with poor results. Aside from getting no results, I'm getting results that have little or nothing to do with what I was looking for.

At first, I thought my search skills had evaporated, but I realized Google has become little more than a digital flea market. Even Yandex is better in many regards than Google.

lostlogin · on Aug 4, 2019

I particularly love it when the results are terrible, and then you notice that below them it say something like “include <search term>?”.

Why not just search for what I typed? It has gone backwards.

KeithBrink · on Aug 4, 2019

Have you tried using verbatim mode? I have that as my default search mode in Firefox, and it always searches for your exact phrase, spelling mistakes and all.

Still has its problems, but better than the default search mode.

gambiting · on Aug 4, 2019

No, I had no idea this was a thing. Will have to try it out.

fxtentacle · on Aug 4, 2019

To me, it looks like they introduced bloom filters to pre-filter their indexing. Basically, they run an offline job to create a list of the top 1 mio search keywords and hash them into a compact bitfield. That one is sent to all the crawlers, who will then only index the keywords that have hashes matching the bitfield. It's a standard way to reduce the amount of data that you need to process, and since search keyword revenue is likely exponentially distributed, they would lose almost no revenue by cutting off the 1mio +1th keyword, but greatly reduce their costs.

As for the sentence search, it appears to me that they are using stemmed shingles. For that, you reduce words to their base form (houses -> house) and then you chop the page down into every 3-pair of words and hash those. The hashed number for 3 words is then called one shingle. Again, that's a common technique for content similarity detection, so Google kind of has to do that anyway to spot verbatim copycats. One can then also use this for sentence search by searching for all the shingles in the search query, but that introduces ambiguity. For example "cats singing in front of the house" would likely have the same shingles as "cat sang about front before houses", as only "cat sing front house" are being indexed. The issue here is that "front" can be used both in a positional way, as well as for a military formation.

To me, both of these look like side-effects of them reducing operational costs. I mean, who would bid marketing dollars on "ISpatialAudioClient::GetMaxDynamicObjectCount" ? Given that you will likely tolerate 2-3 complete search failures before trying Bing, the business value of completing such a query correctly is very close to 0.

Microsoft, on the other hand, has a stronger interest in people finding MS APIs, which is why you will find this result on Bing https://audiophilestyle.com/forums/topic/55776-software-to-s... but not on Google.

mda · on Aug 4, 2019

Your link is the 3rd result in Google.

brynjolf · on Aug 4, 2019

I have to put everything in quotation makes now or it gives me a result loosely related to the specific thing I wanted. It takes me like four searches to actually get the result I wanted. Drives me nuts.

jasonvorhe · on Aug 4, 2019

No, the results have been consistently great.

Whenever I see someone using DDG as a default they always turn to Google for more complex queries.

It might be that Google hat lost focus of a few search niches where results used to be better.

mda · on Aug 4, 2019

I tried your search, all results are about C# classes, generics etc.

gambiting · on Aug 4, 2019

That's the other thing - I've seen it being completely useless for me and then my coworker does the same search and it's fine for him, or vice versa. I imagine Google builds a profile of what it thinks you'd like to see and modifies your search query accordingly.

mda · on Aug 4, 2019

For this example in incognito mode without me logged in, results are similar, no problems. It is normal that it for changes results depending on user profile. Do you have other examples?

msclrhd · on Aug 4, 2019

I get this a lot when searching for exception messages, or for information on how to implement something. It is getting increasingly harder to find the signal in the noise.

tus88 · on Aug 4, 2019

Actually this is a well known change to significant term relevancy. You need to double quote words to force inclusion in results. I'm not saying it was a good change.

Liron · on Aug 4, 2019

It’s on-topic for my other tweet: https://twitter.com/liron/status/1137729277359734786

adjkant · on Aug 4, 2019

Anecdotally 100% experiencing this almost what feels like daily.

trilila · on Aug 4, 2019

Also off topic. Google seems to favour "essays" over news, and most times you need to read the history of man kind before you get to relevant content in a news piece. Also it has no means to detect clickbait which is seriously problematic.

craz8 · on Aug 4, 2019

Recipe sites have gone this way - I don’t need your life story for a gluten free chocolate chip cookie recipe that appears about 1500 words in

onion2k · on Aug 4, 2019

Pagerank uses a bunch of things that would favor older pages - more inbound links, more clicks in the search results, and so on. They have "news search" if you just want news: https://news.google.com/

Yetanfou · on Aug 4, 2019

The "Google News" search can not be seen as an objective search for "news" as it is biased both by the selection and weighting of sources by Google (staff and algorithms) as well as by the search history profile. If you still want to use Google News make sure to access it using a clean browser profile (not logged in, no Google-related cookies) to avoid the search profile contamination which otherwise colour the result. The source selection and weighting bias can not be avoided as that is part and parcel of the way the thing works.

londons_explore · on Aug 5, 2019

I don't think anyone expects Google news to be unbiased?

It collates hundreds of sources, many with obvious or subtle bias. If an aggregator puts bias in, they can't provide unbiased out...

Yetanfou · on Aug 6, 2019

Why not? Google Search started out unbiased, Pagerank (the algorithm) did not care about any opinions or attitudes or leanings and the results ended up being representative of what was out there on the 'net. The same approach could have been used for Google News by pulling in any news source found by Google's crawlers which happened to report on any specific issue, weighing the rank of those sources the same way that regular search results used to be weighed (i.e. those sources which are linked to more often (after filtering out link farms) end up higher than fly-by-night operations which nobody links to. This is not what Google News does though, instead is seems to either use a whitelist or apply a blacklist to produce 'reputable news sources'. It is there the bias starts, in the selection of which sources to link to.

JonathanCreek · on Aug 4, 2019

If you type “google” it will try to find itself, go into infinite loop and crash. Source: Internet