So PageRank was an algorithm, and RankBrain is an AI? I'd love to understand a bit more of what makes them different from each other. I don't feel as though I've seen the search results become any better. In fact I've been frustrated by how many words it leaves out without telling me. Or how it says "here are the results to your search" when in all honesty it had 0 results to my search.
PageRank and RankBrain both seem to be (complicated) features to the actual core search algorithm, which is some unspecified machine learned model [1].
PageRank, for example, can't be the whole search algorithm since it's not even query-dependent. It would just put the same most authoritative document at the top for every query.
Similiarly, RankBrain doesn't sound like it could be the whole algorithm. It sounds like it is just a text understanding model (which wouldn't know anything about e.g. the global reputation of the document, the popularity of the document, etc.). In fact, the article explicitly confirms this:
>RankBrain has become the third-most important signal contributing to the result of a search query, he said.
I'd guess some kind of composite content quality signal and some kind of composite popularity signal sit at #1 and #2.
Hiya, author here - yes, you are correct. RankBrain is one of hundreds of distinct signals that go into the results page. It just happens to be one with a great deal of influence, which I think demonstrates the surprising generality/adaptability of this type of approach for natural language processing and interpretation.
Well, except the results of Google got worse and worse over the past years (since around 2013 I could pinpoint the issue), to the point that by now often only one or two entries on the results page are relevant.
Even if I search for a very specific query, Google will just only give me results ignoring several parts of my query, and at best one or two about my actual query.
Even duckduckgo manages to give better results often at this point.
EDIT: Just found verbatim search, at least that gives almost equal results as duckduckgo. Still not as good, but it’s okay.
Even Verbatim search will happily lop-off search words.
Tonight I was searching for "raf ballykelly vulcan" and after a few puzzling results realised that Ballykelly had been discarded. Since it was the airfield in question the results were therefore useless.
Either they've fixed it or I'm not seeing the problem - I get a bunch of pages that include ballykelly and vulcan. Forcing "ballykelly" didn't seem to make any difference.
I don’t have one specific query – it is EVERY query where Google will ignore 90%+ of the words I entered, and show me around 5 related results, and otherwise completely bullshit results, with no way to see more.
Use [Search Tools] => [All Results] => [Verbatim] that is a known issue for people who have been using Google Search for years. There are many discussions about this topic, like: https://www.webmasterworld.com/google/4744658.htm
Often the issue occured with searching technical things, like, I copy-pasted a python exception, and Google would tell me where to download python and which python books to get and that python is a type of snakes.
It’s incredibly frustrating, but if I, by accident, use Google again and see the issue, I’ll tell you.
Thank you for this. I felt like my search terms were being "conveniently" omitted for the longest time just to show me more results instead of more refined results.
Perform a search then change the setting to Verbatim. Right click the search bar and "Add as Keyword" or "Add as Search Engine" (depending if you are using Firefox or Chrome)
Then give it a keyword, I use "vg" for "Verbatim Google".
Then in the Navbar I can type "vg foo bar", which will search "foo bar" verbatim. Closest thing to permanent once you get used to using keywords ( which are awesome by the way :D )
RankBrain is at the front, query-interpretation end. PageRank is at the back end, for picking pages which reasonably match the query.
Does RankBrain have an intermediate form which shows its interpretation of the query? Wolfram Alpha does, and will show an explanation of how it interpreted the query. (It has to, because it may give you an numeric answer). It would be useful for Google to tell you what question they think you are asking.
Marketing. AI is the new word for algorithm. I think most people probably feel that search quality is going down; unsurprising given Google's monopoly position.
That's going too far. AI always uses algorithms. What constituted it varied quite a bit. However, we usually allowed the term if it involved machine learning or decision-making based on heuristics. Especially if it was adaptive overtime. The AI's were also usually more resource intensive (slower) than regular algorithms. Kept them out of use in many places until AI field caught up with requirements.
PageRank was a simple, stupid algorithm that produced incredibly smart results. The exact kind of thing that sees widespread deployment with a startup. The description of this AI sounds more like an AI tool in general. It would've been much harder for Google to have started with this. The computers alone would've been prohibitive. So, we can call it an AI.
My guess is RankBrain is the personalization piece that operates on user data (location, history, etc.) while PageRank is the search index piece that operates on web data (web-pages, trends, etc.).
Hiya. For those interested, the RankBrain approach of converting words and phrases into vectors ties directly to Geoff Hinton's more ambitious ideas about AI. He speaks about it a bit from 32 mins in, in this video from the Royal Society in London earlier this year.
Geoff Hinton - "If we can convert a sentence into a vector that captures the meaning of the sentence, then google can do much better searches, they can search based on what is being said in a document. Also, if you can convert each sentence in a document into a vector, you can then take that sequence of vectors and try and model why you get this vector after you get these vectors, that's called reasoning, that's natural reasoning, and that was kind of the core of good old fashioned AI and something they could never do because natural reasoning is a complicated business, and logic isn't a very good model of it, here we can say, well, look, if we can read every English document on the web, and turn each sentence into a thought vector, we've got plenty of data for training a system that can reason like people do. Now, you might not want to reason like people do on the web, but at least we can see what they would think."
Google users feed Google with training data everyday, just by clicking on links and refining search queries. They basically tell Google what they were looking for. I have no idea how Google works, but I'd guess the user data plays a big role in rankinkg the search results.
The article uses this query as a motivating example:
"What’s the title of the consumer at the highest level of a food chain?"
But the results page (for me) does not contain the words 'apex predator'. The top result is the wikipedia page for "Consumer (food chain)", which does contain that term.
It would have been very cool if the AI could have identified the concept described by the query. But it didn't. It just found a very relevant page for three strings in the query.
The journalist doesn't report on the results of this example. Who came up with it and why?
This is presented as if AI is a new thing to Google. The truth is that Pagerank is based on a classic neural network. The pages are the nodes, the links are the weights and we are the feedback. It has been in training since at least 1996 ;)
I don't think that anybody considers PageRank to be a classic neural network. It's a recursively defined centrality algorithm. It has a graph structure; beyond that it's not really a neural network.
It's not that it's a secret. Consider this quote from early 2000: Reporter: "Why would we need another search engine? Alta Vista is quite good enough." - Larry Page: "We're not building a search engine. We're building an A.I."
By inference, this looks to be "just" integrating a deep semantic embedding (presumably neural network based) of the individual webpages as a signal into their existing ranking framework.
Tried to get at this in the article (author here), but it's using vectors (think word2vec and seq2seq) to distill meaning and embed words and phrases into a single space that the computer can then use to reason about. From my understanding this is all done on the query end of things, so it's basically letting them do better natural language processing. It also ties into Hinton's work on "Thought Vectors".
As some feedback to the author, the following sentence doesn't make sense: "Artificial intelligence sits at the extreme end of machine learning..." Machine Learning is a subfield of AI.
Thanks for the feedback. As a general news organization we struggle with definitions/scoping for stuff within AI as it's such a new area and we try to write for a broad, albeit informed, readership. I'll keep this in mind for future articles where we classify the two.
Natural language processing, inference, and machine learning are all AI-related techniques. Calling a system that uses all of them effectively an AI is a fair label.
Hiya. They wouldn't explicitly confirm that it is word2vec, but everything we discussed indicated it's likely doing something roughly equivalent to word2vec, and is also doing similar conversions for sequences which is likely connected to Sequence to Sequence learning (PDF: http://papers.nips.cc/paper/5346-sequence-to-sequence-learni...). It also links to Geoff Hinton's stuff on Thought Vectors which implicitly involves word2vec.
Google seems to be getting worse with technical queries. I spend many queries just trying to craft a query that gets the results I am looking for. This is especially true for keywords where case is important. Google seems to just neglect case as a signal.
"" always works for me, but the annoying part is how often I have to use it these days, because Google so often second-guesses my queries.
I suppose for the average person this "fuzzy searching" is an improvement, but I wish I had the ability to flip a switch somewhere that says: "Please only use exactly the words I gave you, always."
I've noticed this as well, and I think it's because 1) technical queries have a different optimal algorithm than non-technical queries and 2) as Google's audience grows, the proportion of technical queries shrinks.
For a technical query, you essentially want something like PageRank-weighted grep, which is, of course, what you used to get. All of the fancy NLP/fuzzy-matching stuff that Google has been adding recently, while helpful for all sorts of other things, is going to be a detriment for technical queries.
When you're doing something like googling an error message or a code snippet, you're basically querying machine-generated speech, and much of Google's recent work has been on improving querys of human-generated speech.
It seems like it should be simple to implement a little "technical query" checkbox...
Have you tried "Verbatim" mode? It's under the Search Tools drop down along the top menu bar (where News, Images etc. are), then under "All results", and basically seems to minimise the amount of clever business Google does with your search (synonyms, "did you mean?" etc) and so can be quite useful for technical searches
If I am logged in, I get so-so results for the first few queries on a topic, and then really good after that once Google realizes what I'm actually looking for (e.g. it will learn that when I am looking for "Unity" it knows I mean "Unity3D", not "Unity Ubuntu").
Of course, being logged in all the time makes me uncomfortable...
Sometimes this feature works against you. E.g. I need to do X in java, maybe someone already has an implementation I can look at so I search for "how to do X in java". Turns out there's not really any good solutions so I broaden the search figuring I could learn from an implementation in any language, but now "how to do X" is just filled with the same useless java focused results as my first search.
Totally agree with this, I've come up against this problem in the past week. I've found that I get the same list of results regardless of how I phrase a search term involving two JS frameworks. It's like it only matches those keywords and ignores the rest of the words.
I don't think it's just technical queries. I have to look up a bunch of tech and nsfw stuff, and when google tries to think for me, it ends up going to shit. I usually end up just "searching" "like" "this". Even then, it tends to ignore words in quotes |:(
None of the results are relevant because Google thinks I made a typo.
If I then search instead for "does ctime always change" the top result is:
"Does directory mtime always change when a new file ..."
Google has fuzzed ctime to match mtime which is not the same thing.
My intention was to see whether ctime would always be updated if there was any data or metadata change to a file or folder, or if there would be some edge cases where it would not change.
The first 3 results are all acknowledged by Google to be "Missing: ALPN". I get this often with queries, where Google returns results which are missing the keywords I am most interested in.
My intention was to see if there was any progress with WebSocket handshakes over ALPN (to save a roundtrip).
From the descriptions of the results it is also not clear if any of the top 10 matches are relevant.
Case and (as someone mentions below) punctuation have never been part of google search. You could search with those in Code Search, but sadly that's long gone :(