Google search absolutely does hallucinate completely fictitious results. It's called SEO spam.
Google just gives you associations provided by random other people on the internet. It's largely garbage, most often deliberately disingenuous (to make you look at an ad). Ad revenue models for the internet encourage the generation of this type of false material.
A better criticism would be that the same thing will happen to something like chatgpt -- and the question is whether the model for analysis can better handle it at scale.
> Google search absolutely does hallucinate completely fictitious results
No, it absolutely does not. Yes, there is SEO spam in the index, but no - it is not Google hallucating it. It really exists on the internet, see also the second point of my comment.
> the same thing will happen to something like chatgpt
This isn't something that "happens to" GPT, GPT is doing it. There's probably even already GPT -> SEO spam pipelines out there generating websites.
> "it is not Google hallucating it. It really exists on the internet,"
The exact same thing is true for chatGPT, or any other computer system. It is providing information and associations based on the input dataset.
> "GPT is doing it."
And google is "doing it," when google decides there is an association between my query and a bad response. Both systems are analyzing a corpus, drawing associations, and returning parts of that corpus. The output is deterministic based on the input.
The types of associations differ in their depth, but there is no fundamental difference in terms of agency or outcome.
> The exact same thing is true for chatGPT, or any other computer system. It is providing information and associations based on the input dataset.
No, it's not, for example if you ask google to show you papers about some topic with words in quotes you think you remember from the paper it will show you the proper link IF it exists and language model will just generate you a result that doesn't exist.
If I search something on Google that doesn't exist or that it have no answer I can see looking at list of search results that probably either what I look for doesn't exist or my assumption is false but language model will generate you plausible explanation/answer that can be 100% false and it doesn't know or understand that it's false and you will have no way to know if it's false or true and no point of reference because ALL the results you will receive could be hallucinated.
> "if you ask google to show you papers about some topic with words in quotes you think you remember from the paper it will show you the proper link IF it exists and language model will just generate you a result that doesn't exist."
Not really.
What will actually happen is Google may link me the actual research paper, along with thousands of other associated pages which may or may not have all kinds of fake, false information. For example, if I search for "study essential oils treat cancer" I get an estimated 190 millon matching documents. A huge percentage of these have false and misleading information about using oils to treat cancer.
> "language model will generate you plausible explanation/answer that can be 100% false and it doesn't know or understand that it's false and you will have no way to know if it's false or true and no point of reference because ALL the results you will receive could be hallucinated"
With google it is third parties "hallucinating" the wrong answers (or worse: intentionally answering wrong in order to exploit and profit). The overall dynamic is not different. Google is providing these wrong answers, written by others.
The overall dynamic of providing information of questionable veracity is generally the same - because the question of who creates the associations and incorrect content is not particularly germane.
You are mistaking again model with dataset. To be on the same knowledge depth they both have to use the same dataset, the difference is that in search engine you have points of references, rankings and reputation of sites, human discussion and the most important source of the answer, so a lot of signals on which you can also rank the answer yourself. In language model you have none of that, ZERO signals and not only it can return answer that was NOT in the dataset but it can make it plausible looking.
This is a distinction without a difference. Both Google and the AI can give you clever but fake results. So at this point the question is which produces fewer fakes?
I think it's important to consider incentives. When you search for a topic that's controversial or political you will find lots of spam in Google. But in that case you understand that you need to approach the results with care and do your own research. GPT is the same here. You're not going to treat its answers about political topics as "the truth" - For these kind of topics GPT is actually quite good!
But scientific facts are a different story. Nobody has any incentive to claim that 1+2=4 or that some function in Python does X when it really does Y. So when you search for these kind of facts on Google you can pretty sure that you get correct answers, or at least someone trying to give you the best answer they can. But not so with GPT. It may give you incorrect answers even for these kind of facts if they are not within the reasoning ability / training data.
> "But scientific facts are a different story. Nobody has any incentive to claim that 1+2=4 or that some function in Python does X when it really does Y."
Incentive is irrelevant. What mattes is whether these things do happen, irrespective of intent -- and they do! I very, very frequently find incorrect answers to math questions, tech function questions, etc.
Incentive is an important part of the dynamic, but it's not important to consider if we're looking empirically at the integrity of the results.
> "So when you search for these kind of facts on Google you can pretty sure that you get correct answers, or at least someone trying to give you the best answer they can. But not so with GPT."
It is so with GPT. Both systems are "trying to give you the best answer."
I think what you're observing is that the Google search engine has two decades and billions of dollars behind it and ChatGPT is a research preview - not even a finished product.
I remember using search engines in the late 90s (in fact, I worked on one of the leading ones). I think you are extending far too much credit.
> It is so with GPT. Both systems are "trying to give you the best answer."
No, based on your responses you do not understand how language model works. Google is searching in index using keywords and rankings, ChatGPT is predicting plausible words without searching anything anywhere.
What you argue is like saying there is this two guys in library and you ask them to find you something that exists or maybe doesn't exists, both have read all the books, one (Google) have created index of all the words from the books and is going through it to answer you and the other (ChatGPT) do not use any index but he uses his memory with compressed knowledge of statistics between words and will answer by trying to predict any answer that fits statistics between words and in many cases it will basically lie to you and you will have no clue that you were lied to.
There is distinction between indexing human knowledge about some topic where most of the top results are correct (Google) and creating statistics model between words and making things up that never existed and are wrong (ChatGPT).
> "Google is searching in index using keywords and rankings, ChatGPT is predicting plausible words without searching anything anywhere."
Expand your scope to both Google, and the creation of an ecosystem of SEO pages which Google incentives. They are the same, in totality. Google doesn't just index -- it also funds the creation of landing pages.
> "There is distinction between indexing human knowledge ... and creating statistics model between words and making things up that never existed and are wrong "
It's a false distinction. Google is more than a search engine; it is also an advertising company that incentivizes original content creation with the express intent of providing answers to queries.
> It's a false distinction. Google is more than a search engine; it is also an advertising company that incentivizes original content creation.
Obvious straw man argument. Replace word google with search engine.
> Expand your scope to both Google, and the creation of an ecosystem of SEO pages which Google incentives. They are the same, in totality. Google doesn't just index -- it also funds the creation of landing pages.
This doesn't matter, you are mistaking dataset with the model. Search engine will not return to you things that were not in dataset, it will give you many results that you can judge with many points of reference. Language model will return you one answer, answer that could be a correct result that is inside the dataset or could be totally false and incorrect but plausible and you will have no point of reference to check that unless you use a real search engine.
Google search also sometimes adjusts your query to something completely different. Black romance is a romance where both characters are Black. Dark romance refers to romance with darker elements such as abuse, sexual assault, or violence. I searched for the former but received results for the latter; the word Black wasn't even present on the page.
> Google search absolutely does hallucinate completely fictitious results. It's called SEO spam.
I wouldn't categorise that as "hallucinating fictitious results" - the algorithms still only returns existing results. If you follow the link, you will find key words embedded in the HTML or visible text in the browser.
It kind of does in case of their new Questions & Answers feature. They often give wrong or nonsensical answers to queries. To be fair, it doesn't hallucinate the results but offers little snippets from the web that answer something else than what was asked.