I have that beat, as I once made Google perform a 30-second query. I know exactly how I did it, too. I wasn't sure whether or not it would find it...
So how does Google do a query? Well fundamentally it has lists of sites that have a keyword, so it gets the list for each of your keywords, then finds the intersection of those lists, then sorts it by the pagerank of the resulting URL's. That's how it used to work anyway.
So back in the day I decided to try to do a test. I got the Gutenberg list of all of Shakespeare's words, and I wrote a program to go through it to find the longest string of all "stop" words (which you could still force Google to include - normally it ignores them, since there are just too many sites that include words like "the" and so on). Stop words are super common words like "the" which would be on millions or billions of pages. A huge list of URL's.
So I coded up my little algorithm that goes word by word through the text, keeping track of what the longest string it found so far is, that consisted entirely of stop words.
The longest string it found was: "from what it is to a".
Just now I again did that query, and on the whole Internet it just found 4 matches (soon it will find this comment too and any archives of this comment), all being the exact phrase from Shakespeare:
Impressively, that query now takes 0.84 seconds (I haven't done that query in several years, it's possibly cached but I doubt it.)
However, when I first performed it, it took 30+ seconds. I didn't take a screenshot but I was super impressed. I brought Google to a crawl for 30 seconds, in the exact way I was intending. Moohahahahaha.
"Holy cow. I just made Google's databases join six lists each with millions of pages on them, find the intersection, and then go through all of them for which ones had my phrase in literal order. And then it found it."
Pretty mind-blowing that today it can do that in < 1 second.
The key for people who don't get why it was so much work for Google is this: where I just wrote "then go through all of them for which ones had my phrase in literal order" I meant "then go through all of the cached contents of every single one of the web pages on the results list -- since the joined list itself would have every web site that has those 6 words, which is pretty much every English-language page on the web of more than a few thousand words. So it neeeded to scan through the cached contents of all of them to find which ones had the phrase in order."
For example I just looked at the top post on Reddit right now, it's about 20 page-downs of comments, and has the words this many times:
from 17
what 22
it 223
is 169
to 195
a - more than 1000
pretty much every English-language web page on the whole Internet will have those words, unless it is very short.
> So it neeeded to scan through the cached contents of all of them to find which ones had the phrase in order.
If you have a positional index [1] you don't have to go through the full cached content, you only have to check the index to see whether there's an occurrence of the words in the query with the correct distance. E.g. for that Reddit post, you'd check the 17 occurrences of "from" and notice quickly that your query string doesn't match at that position.
"The sun never sets on the British Empire" it was up there historically with Rome and Alexander's conquests (although the latter may not have constituted a bona fide empire)
> on the whole Internet it just found 4 matches, all being the exact phrase from Shakespeare
But why just 4 occurrences? Every utterance of Shakespeare has been printed on thousands of web sites and Google has indexed thousands of those sites. Another line from the same play, "Nay, answer me: stand, and unfold yourself", gets 10,300 hits. I tried another line from deep within the play: "I know him well: he is the brooch indeed." Google found 3,890 results. Even though the search is slow, why aren't we getting thousands of hits for "from what it is to a"?
The other odd thing is that if you search for the exact phrase "transform honesty from what it is to a bawd", which is what that particular all-stopword string is taken from, you get a lot more results. Clearly Google isn't actually finding most of the pages in its index that match that string.
> why aren't we getting thousands of hits for "from what it is to a"?
Because Google isn't an exact search engine. Each of the terms in the phrase above appears on billions of pages. My guess is that for common terms Google doesn't store all postings in its inverted index but truncates the posting lists after a couple of million entries.
If you're interested in slow queries, try the one below (via [1]):
the OR google OR a OR "supercalifragilisticexpialidocious" -the -google -a
You can replace the "supercalifragilisticexpialidocious" term with something unique if the results end up cached. This takes upwards of 5 seconds to return back.
One question is, will such complicated queries bring all of Google to a crawl, or will they e.g. be executed with lower priority or on different servers, leaving the rest of the queries unaffected?
Of course they don't affect other queries, these would be non-locking queries on replicas. Or, to be even more accurate, they'd be non blocking queries on indices cached in memory on edge servers and probably several levels of optimization beyond that. There's no reason for these type of queries to lock a database.
Perhaps it also keeps an index of pairs of adjacent words.
Another thing is that your query is essentially a literal "substring" search, which Google probably handles differently. See e.g. the Burrows Wheeler transform for an idea how it could be implemented.
If you want to know how Google search got fast, the book "Managing Gigabytes" covers many of the basic optimizations to speed up search. I found the book very enjoyable to read.
For everyone who is suggesting Google search quality has taken a hit, I would suggest a quick test to see for yourself that Google is still WAY better than the competitors when it comes to long tail queries.
Example: someone compiled a list of things they learned from indiehackers.com interviews.
There are a lot of quotes on that page, and unfortunately none of them are linked back to the original interview. Some quotes are very interesting, so I wanted to find the original interview.
I took some of those quotes, put them in double quotation marks, and searched on Google, DuckDuckGo and Bing. By the way, you can only replicate these results by adding the double quotation marks.
Results:
Google always shows toomas.net in the top results, and almost always finds the relevant interview article on the indiehackers website.
DDG (usually) finds the article written on toomas.net, but not the indiehackers interview.
Bing often fails to list toomas as the top result, and doesn't find the indiehackers interview at all.
It's worth mentioning that indiehackers.com is a JavaScript-only website and doesn't work with JS turned off. Google's spider can execute JS, while others do not. This is why indiehackers.com doesn't show up, not because of the search engines themselves.
I have a similar issue with a side project of mine. JS-only websites solve this with server-side rendering or static generation.
When I think of a search engine I think of it as two things.
1. A web crawler
2. A search index
Or as a good boss used to say garbage in garbage out... the quality of the engine is as much a function of the index’s ability to rank as the quality of the input from the crawler... the fact that none of the other engines crawl with JS enabled is a huge competitive advantage for google
1 point by jefftk 0 minutes ago | edit | delete [-]
A search engine is trying to predict which pages will be helpful to you in response to your query. This means it should take "how does the page look to users" as input, which today means executing JavaScript.
On the other hand, in the same way that Google is (claiming to be) trying to make the web fast with AMP, they could make the web fast by refusing to index pages that don't provide a non-javascript experience. (Yes, you can make a slow page without Javascript, but it sure is easier with it)
Yes, they would. They'd just use NextJS or similar to push down a static skeleton that minimally covers the crawler. (My current company has to provide data to the Facebook crawler and that's what we do.)
Nothing prevents a site from doing that and just loading JS afterwards. Or from still being basically read-only without JS. Which might be an improvement...except within epsilon of zero people care in the first place and Google is building software for a rather larger proportion of the market than that, so it's an improvement for what's barely an audience.
Nobody is really "enabling" it except for a browser. And that's not changing. The shirt-rending is becoming tiresome.
I'd be curious to know why your company specifically targets large-market crawlers like that, instead of just making things crawlable. It feels like leaving money on the table when you overfit your stack to one market.
It's a shame you find it "tiresome" when people insist on retaining their own opinions, but your exhaustion is not particularly relevant.
We need to build shareable landing pages. We share via Instagram and Facebook (for now). We use NextJS because they do provide static rendering that covers what we care about and are likely to care about anytime soon.
And you can have whatever opinion you want; nobody’s saying otherwise. But the constant whining about nothing — and it is nothing — is exhausting.
> For everyone who is suggesting Google search quality has taken a hit, I would suggest a quick test to see for yourself that Google is still WAY better than the competitors when it comes to long tail queries.
I think these are both true. Google quality has certainly fallen in my opinion (perhaps as a result of having to constantly counter SEO tricks). Other search engines still have quite a bit of catching up to do as well.
Depending on your searches, YMMV. I finally switched to DDG at work because Google refused to show me relevant results for some (admittedly obscure) searches related to lensing in F#, preferring instead to give me political opinion pieces(!) that were completely irrelevant to my query, along with various results about video games. DDG had no such problems.
This is one example of many, but it was the last straw for me. I guess techies aren't Google's core audience anymore, I get that, but it feels like we've lost a valuable tool.
> DDG (usually) finds the article written on toomas.net, but not the indiehackers interview. Bing often fails to list toomas as the top result, and doesn't find the indiehackers interview at all.
Are you able to explain this? Bing and DuckDuckGo are the same thing. DDG uses Bing's API (for non bang searches).
I cannot explain it, hopefully someone from DDG actually chimes in :-) If they just use the Bing API, how can their results be better than Bing? I suppose they also use other sources.
Slight tanget, but it might be interesting to some on search results and how Google's search results can sway users.
Googles been able to use ephemeral experiences (like autocomplete and answer boxes) to influence users, especially undecided voters. The research, which was reproduced by a German team, is believed to have influenced 2+ million people in the US election alone --
That testimony is highly misleading. Taking the highest estimates possible, which are based on essentially a conspiracy theory, the number of influenced voters is much lower than 2 million.
He claims they moved 2.5 million votes, not voters. This relies on an assumption of straight ticket voters and about 20 votes per voter. He doesn't correct people when they make the votes/voters mistake.
And even that new number, 100k voters swayed is not really well founded. He makes a couple of overlapping claims, one of them I looked into deeply and found that using his own papers, a reasonable way of expressing it was that 8-10k additional votes for Clinton was a reasonable upper bound, as he claimed, might be affected.
And the number was likely lower (and was only that high because more people identify as democrat than republican in the US).
(I work at Google, but take interest in this mostly because it's just terrible abuse of mathematics)
> For everyone who is suggesting Google search quality has taken a hit, I would suggest a quick test to see for yourself that Google is still WAY better than the competitors when it comes to long tail queries.
Maybe but it's pretty clear to me that Google now no longer indexes entire websites. This as far as I can tell has not been a concern in the past.
Saying other search providers are worse does not refute the statement that Google Search has been reduced in quality.
The reduction in Google Search quality is their need to be "helpful" with their desire to use my past search results, and other "big data" they gobble up about me to "improve" my specific results when in fact it makes it worse. That is just the base line, then you have the 100's of other things they have done to the search over the years for political, regulatory and other reasons.
Google search gets objectively worse every year than the previous year.
Bing seems to simply ignore the quotes in your query after the first result (for query "This idea that you need to water down your pricing or offer a free beta period is bogus") and then just displays random results. Even SoundCloud is listed there..
would google have returned results from a JS rendered site back then? The ability to index client side JS rendering puts google ahead of their competition, as more sites become JS rendered they support google's monopoly.
More like 2003-2005 for me. By 2007, the quality was sliding due to them targeting mainstream folks, and deprioriting the tech fridge stuff that I care about.
Exactly. My screenshot was a tongue-in-cheek comment on the black bars in the original post. Unless we can reproduce/test it ourselves, then it's as good as a devtools edit.
Honestly, the mentality in this twitter thread is a joke. Google handles insane amounts of data beyond what 99.99% of engineers have ever dealt with. The fact that their search works at all is a miracle. But I guess people like to ignore that and much rather complain on Twitter that it takes a few seconds to query a database of trillions of records.
Google is famous for raising the bar on speed and yeah, their performance is usually impeccable. That’s why the existence of a consistently slow Google search is newsworthy.
It very clearly has to be a bug. I also got ~2.5s. It makes no sense for it to take that long for everyone, even after running it multiple times in a row. The system most definitely has many types of caching, so if it takes that long, there's probably something going very wrong.
There are comments on the thread that suggest that it is an intentional delay to prevent Google itself from being weaponized. Essentially, "powered by" is special because of automatic inclusions like "powered by wordpress" or "powered by nginx."
Well, thankfully you were here to defend one of the largest corporations on the planet from these Twitter meanies who dare discuss a slow responding page.
I don't see any factual objective discussion in the twitter thread. Btw, I'm not a big fan of Google, which I guess is what you were implying. In fact, I think Google is the most dangerous tech company currently in existence and I consider it a severe threat to our society. But that doesn't mean I cannot objectively judge the technical quality of their products and put things into perspective.
It's not what you say, but how you say it. This is just mindless disproportional bashing.
"Come on Google I don't have all day!"
"This is actually one of those crazy "facts from the future" to go back in time and tell people 10 years ago: Google has the dominant browser and mobile operating system, but their search sometimes takes 7 seconds."
It’s really weird. If you add “powered by” to a query it’s really slow, but if you add a minor typo (eg “powere by”) it finds the same results, but faster.
It looks like it is the exact substring that is slow. "asdfpowered byasdf" is slow, too. I bet it's happening at some layer that doesn't even know how to parse the query.
Yea, I was thinking along the same lines. You’ll often see publicly disclosed vulns include google dorks to find vulnerable sites, ex: a google dork to find forums running a vulnerable version of phpbb or something
They're not the same query. They don't return the same results. I seem to recall a dash links words together somehow. So that probably narrows down the search space more.
Google doesn‘t use stopwords because if they did I wouldn't be able to search accurately for information regarding the group "The The". But because we can find the occasional slow-running query they might be using "stop phrases" but weren't aware "powered by" should be one.
I've found that if you go past page 2x on most of these results, Google just stops presenting pages and will tell you that it's omitted the results.
The number of results may represent number of hits in a database, but it's not actually accessible to the end user. After you get into the last few pages, it'll also just cap the result count to the number of pages they've decided to show you.
Yeah, definitely seen Google take a long time. Also seen weirder things. Just a couple days ago I managed to type a query that neither told me there are no results, nor did it actually return any results. Hadn't seen that before.
> Just a couple days ago I managed to type a query that neither told me there are no results, nor did it actually return any results. Hadn't seen that before.
I've had that a few times lately. I thought it was my internet being shoddy but the header and footer load, just no results in the middle nor any error about not finding anything... just blank space.
tldr; querying the term "powered by" causes the query to take an inordinate amount of time (still less than 10 seconds, so a far cry from "all day").
The reason doesn't seem clear, but one comment [0] claims that "powered by" is a common query used by black hats and spammers to find targets. Not sure why any "antispam" behind the scenes would cause the query to delay though. Hardcoded delays?
Finally found a comment that makes sense actually [0]:
"Powered by X" is on millions of web pages because it gets autogen'd by popular web-facing CMS (Joomla, Wordpress etc). So when a search query includes "Powered by" google must determine which among the billion pages with this
phrase is most relevant"
A single query can hit 100s of CPUs. There's over 60k Google search queries per second, according to some random source found on the internet. That would require 600k CPUs. If it's 1000 machines that's being hit it would require 6M CPUs. Also, Internet traffic is bursty, there's maybe 2-3x more queries in peak times.
My experience with working at a FAANG: I remember a service that was easily dealing with >10M QPS (albeit much simpler queries than a Google search), or some other services continuously processing (reading/writing to disks) > 100 GB/s.
There's A LOT you don't see in FAANG frontends. In particular, the Twitter thread owner suggesting https://developers.google.com/speed/ has no idea where is the slowness coming from.
That being said I'd love to be a Google employee to investigate those queries. Perfect bug reports.
Each query uses much less than a second’s worth of cpu, so you can’t just multiple searches per second times cpu’s involved in query to get at total cpu’s needed.
Apparently they don't, look at the posted example :). Seriously though it's probably like half a second or maybe a quarter of a second, that's what Google often reports.
And yes this is an oversimplified model. No idea how much CPU is actually used.
Slightly off topic - has anyone else noticed that in the last couple years Google results have just become....crap? Like, as a programmer I'm used to googling all sorts of things, but lately it's almost useless. Like, my recent fail was searching for "<class name> C# programming" and......the entire first page had both the class name and C# crossed out and was showing me generic results for programming. Top result was some website offering courses in programming. What the shit Google. I personally think it's the rise of devices like Google Home that's to blame - Google tries to reduce every search query into something that can yield a short snippet that can be read back to you - so it's very aggressive towards highly technical queries that it used to be so good at.
They seemed to have abandoned the technical user fanbase around 2012 and are now just openly hostile towards them.
Especially on mobile, making it challenging to edit URLs on their latest chrome browser. UX consistency has gone out the window for weird things like like swapping around the tab ordering for searches (check out chicago, chicagos and chicago's here: https://pbs.twimg.com/media/Cm0C1o8VYAgLv2O?format=png)
Other things like not providing custom date ranges on mobile search are just utterly baffling. Booleans are gone, ranges are gone, and it ignores most of the words I put in. I honestly think alta vista results circa 1997 were better than what I'm getting these days. In fact, they most certainly were.
They've also abandoned their namesake "googol" ... they don't seem to care about http, newsgroups or many other things. They should probably rename themselves to "Around300orso"
Google ceased providing functional tools for technical people a while ago. There's some decent open space for another firm to come along (like DDG or perhaps the Microsoft Renaissance) and just snatch the technical user, create compelling products for just that, and own it.
I'd even pay probably $250/yr or so for such a no-bullshit resource stack - a high quality searchable answers-oriented quick-to-use, quick-to-read technical reference without a bunch of wrong information or concrete answers buried in pages of theory. Heck, I'd probably consider dropping $2,000, that sounds amazing.
I'm seeing there's some niche cases where it's worse (especially with some extreme quotation-mark-heavy searches) but I've switched a few months ago and never looked back.
The thing is: Most searches aren't even that complicated. Whatever "magic" (read: extra processing power) Google uses is mostly helping for extreme niche cases. The rest they seem to do, nowadays, is "editorializing" results, pushing popular websites before more relevant ones, displaying results from some internal database, etc.
Not the parent, but a stupid reason why I can't use DDG is that it doesn't support IPv6. It's extremely unhelpful when you're doing some IPv6-only testing and then can't search for answers.
As far as I can tell, 6 years ago the reason was that AWS didn't support IPv6[0]. It has for 2.5 years by now though.
Switched to DDG recently and was pleasantly surprised to find how well it excerpts the top StackOverflow answer for technical questions. Just for this I would it's actually a better option for developers, at least for 95% of your queries.
I should do it more. Maybe exclusively for a week and see how it goes.
I've actually run into Gabriel a few times online and had a bit of conversation with him. Nice guy. Can't really say I've chatted it up with Eric Schmidt...
Once I internalized "just add !g if the results suck" using ddg became practical for me. And I don't know if it's me or ddg actually improves over time but I feel like I'm doing it less and less.
Sometimes Google seems to be better at guessing the context especially if you search for C and some other term that has a bunch of different meanings.
The turning point for me was five years ago when they dropped the "Discussions" filter that would search mailing lists and forums. It was amazing for obscure technical and programming issues. Here's an article about it from the time: https://www.seroundtable.com/google-search-filters-gone-1799...
I am an amateur computer historian, and I can confirm it went from good to shit. For instance, I run an archive at ps-2.kev009.com and there is an oldskool cgi search on my site (Xapian Omega). If you search IBM P/Ns or FRUs on G, versus on my site (for stuff my archive covers), you can see it first hand. My search is exhaustive and it is readily apparent G is not. I don't recall that always being the case, G started to penalize old content, http content, and all kinds of other stuff to the point of oblivion.
This makes me sad, because I will often have very exact queries like a part number and there is content out there on the internet that I cannot locate with search.
I'd love Google to separate "fresh" and "old" into two different search engines. When using a new framework or a version of a language or when searching for news I don't want old results. However, when searching for a part number, old documentation, information about a person I do want old information. Right now it just seems google purges the old instead of building two indexes.
One of the reasons I go to Google is I'm trying to answer a question where I know the subject area but not the answer to the specific question. So I'm asking for a combination of general concepts and a specific thing.
And seemingly every time, the specific thing is crossed out. At that point Google search is worse than useless.
I have a few hypotheses:
1) They are trying to turn it into a general directory in their continued war on DNS.
2) The web has got so big that their technology just isn't good enough anymore.
3) The users they care about are no longer people (such as technical or, per sibling comment, academic) who are trying to find specific information. They prefer to focus on mass-market.
Our strongest defence as consumers is to resist the further slide into monopoly.
> 3) The users they care about are no longer people (such as technical or, per sibling comment, academic) who are trying to find specific information. They prefer to focus on mass-market.
This is the only explanation that makes sense. The vast majority of people probably enter questions and queries in natural language so a purely keyword based search is out the window. Why a list of stop words wouldn't cover that I don't know, but let's just assume it doesn't. The second point is that most people also probably hardly ever look for precise information but ask ambiguous stuff that's just hard to properly answer with traditional approaches.
So my guess is that at some point they realized that most queries are of that kind, then shortly discussed keeping two search engines in parallel and finally went with "screw those stupid devs, we're freaking Google!"
It has been becoming crap a long time now. In the past few years i just need to include, well, everything in quotes just to get somewhat relevant results instead of very generic ones. For me it began with their removal of the + search operator (to boost google+ organic traffic from google). Also removal of boolean operators. Also very, very aggressive typo fixing (i meant "Brose Wollis" not "Bruce Wills" god damn it). I think that google home is just the latest thing that furthers googles' exclusive catering to the lowest common denominator - the people who use google as DNS and fail to do even that. I guess there's not that much of ad revenue from power users.
> In most cases, Google’s algorithms make things better for our users - but in some rare cases, we don’t find what you were looking for. In the past, we provided users with the “+” operator to help you search for specific terms. However, we found that users typed the “+” operator in less than half a percent of all searches, and two thirds of the time, it was used incorrectly. A couple of weeks ago we removed the “+” operator, encouraging the use of the double quotes, which are more likely to be used correctly.
I'd be intersted to know how often the "quotes" are used correctly.
0,5% of the queries used "+" so they removed it? Feels like a dogmatic "data driven" UX change for the sake of change. The interesting number would be how much it was used when the first queries failed and gave to much answers.
I also find that annoying. I wonder if it started because Google favour "fresh" content, and so people keep changing the content of their pages and Google keeps that page even when Y has been removed. Here's MattCutts from 10 years ago saying this is sometimes useful:
> Your "Red Room" query is hard in a couple ways. First, it looks like that root page used to have the words on the page: "The Red Room Doors open 6pm $18 Pre-Booked" And it's also tough because it looks like the name changed to the "2nd Degree Bar & Grill" at some point. The fact that you can type [red room] and get a suggestion for [red room st. lucia] is actually pretty helpful in my book because it leads you to the answer that the name changed.
Can't believe how dumb it is. Imagine a surgeon asking an assistant for a specific type of scalpel, and that assistant bringing an assortment of scalpels.. "I specifically specified X scalpel, god damn it!".
I switched to DDG a while ago, and sometimes hit Google, usually with poor results. Aside from getting no results, I'm getting results that have little or nothing to do with what I was looking for.
At first, I thought my search skills had evaporated, but I realized Google has become little more than a digital flea market. Even Yandex is better in many regards than Google.
Have you tried using verbatim mode? I have that as my default search mode in Firefox, and it always searches for your exact phrase, spelling mistakes and all.
Still has its problems, but better than the default search mode.
To me, it looks like they introduced bloom filters to pre-filter their indexing. Basically, they run an offline job to create a list of the top 1 mio search keywords and hash them into a compact bitfield. That one is sent to all the crawlers, who will then only index the keywords that have hashes matching the bitfield. It's a standard way to reduce the amount of data that you need to process, and since search keyword revenue is likely exponentially distributed, they would lose almost no revenue by cutting off the 1mio +1th keyword, but greatly reduce their costs.
As for the sentence search, it appears to me that they are using stemmed shingles. For that, you reduce words to their base form (houses -> house) and then you chop the page down into every 3-pair of words and hash those. The hashed number for 3 words is then called one shingle. Again, that's a common technique for content similarity detection, so Google kind of has to do that anyway to spot verbatim copycats. One can then also use this for sentence search by searching for all the shingles in the search query, but that introduces ambiguity. For example "cats singing in front of the house" would likely have the same shingles as "cat sang about front before houses", as only "cat sing front house" are being indexed. The issue here is that "front" can be used both in a positional way, as well as for a military formation.
To me, both of these look like side-effects of them reducing operational costs. I mean, who would bid marketing dollars on "ISpatialAudioClient::GetMaxDynamicObjectCount" ? Given that you will likely tolerate 2-3 complete search failures before trying Bing, the business value of completing such a query correctly is very close to 0.
I have to put everything in quotation makes now or it gives me a result loosely related to the specific thing I wanted. It takes me like four searches to actually get the result I wanted. Drives me nuts.
That's the other thing - I've seen it being completely useless for me and then my coworker does the same search and it's fine for him, or vice versa. I imagine Google builds a profile of what it thinks you'd like to see and modifies your search query accordingly.
For this example in incognito mode without me logged in, results are similar, no problems. It is normal that it for changes results depending on user profile. Do you have other examples?
I get this a lot when searching for exception messages, or for information on how to implement something. It is getting increasingly harder to find the signal in the noise.
Actually this is a well known change to significant term relevancy. You need to double quote words to force inclusion in results. I'm not saying it was a good change.
Also off topic. Google seems to favour "essays" over news, and most times you need to read the history of man kind before you get to relevant content in a news piece. Also it has no means to detect clickbait which is seriously problematic.
Pagerank uses a bunch of things that would favor older pages - more inbound links, more clicks in the search results, and so on. They have "news search" if you just want news: https://news.google.com/
The "Google News" search can not be seen as an objective search for "news" as it is biased both by the selection and weighting of sources by Google (staff and algorithms) as well as by the search history profile. If you still want to use Google News make sure to access it using a clean browser profile (not logged in, no Google-related cookies) to avoid the search profile contamination which otherwise colour the result. The source selection and weighting bias can not be avoided as that is part and parcel of the way the thing works.
Why not? Google Search started out unbiased, Pagerank (the algorithm) did not care about any opinions or attitudes or leanings and the results ended up being representative of what was out there on the 'net. The same approach could have been used for Google News by pulling in any news source found by Google's crawlers which happened to report on any specific issue, weighing the rank of those sources the same way that regular search results used to be weighed (i.e. those sources which are linked to more often (after filtering out link farms) end up higher than fly-by-night operations which nobody links to. This is not what Google News does though, instead is seems to either use a whitelist or apply a blacklist to produce 'reputable news sources'. It is there the bias starts, in the selection of which sources to link to.
So how does Google do a query? Well fundamentally it has lists of sites that have a keyword, so it gets the list for each of your keywords, then finds the intersection of those lists, then sorts it by the pagerank of the resulting URL's. That's how it used to work anyway.
So back in the day I decided to try to do a test. I got the Gutenberg list of all of Shakespeare's words, and I wrote a program to go through it to find the longest string of all "stop" words (which you could still force Google to include - normally it ignores them, since there are just too many sites that include words like "the" and so on). Stop words are super common words like "the" which would be on millions or billions of pages. A huge list of URL's.
So I coded up my little algorithm that goes word by word through the text, keeping track of what the longest string it found so far is, that consisted entirely of stop words.
The longest string it found was: "from what it is to a".
Just now I again did that query, and on the whole Internet it just found 4 matches (soon it will find this comment too and any archives of this comment), all being the exact phrase from Shakespeare:
https://imgur.com/a/lvU1QSs
Impressively, that query now takes 0.84 seconds (I haven't done that query in several years, it's possibly cached but I doubt it.)
However, when I first performed it, it took 30+ seconds. I didn't take a screenshot but I was super impressed. I brought Google to a crawl for 30 seconds, in the exact way I was intending. Moohahahahaha.
"Holy cow. I just made Google's databases join six lists each with millions of pages on them, find the intersection, and then go through all of them for which ones had my phrase in literal order. And then it found it."
Pretty mind-blowing that today it can do that in < 1 second.