Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The way around that is that is for LLM-based tools to run a regular search engine query in the background and feed the results of that in alongside the prompt. (Usually a two-step process of the LLM formulating the query, then another pass on the results)

The used results can then have their link either added to the end result separately, guaranteeing it is correct, or added to the prompt and "telling the LLM to include it", which retains a risk of hallucination, yes.

Common to both of these is the failure mode that the LLM can still hallucinate whilst "summarizing" the results, meaning you still have no guarantee that the claims made actually show up in the results.



> The way around that is that is for LLM-based tools to run a regular search engine query in the background and feed the results of that in alongside the prompt. (Usually a two-step process of the LLM formulating the query, then another pass on the results)

Would the LLM-based tool be able to determine that the top results are just SEO-spam sites and move lower in the list, or just accept the spam results as gospel?


This is an extremely tricky question.

The practical and readily-observable-from-output answer is "No, they cannot meaningfully identify spam or misinformation, and do indeed just accept the results as gospel"; Google's AI summary works this way and is repeatedly wrong in exactly this way. Google's repeatedly had it be wrong even in the adcopy.

The theoretical mechanism is that the attention mechanism with LLMs would be able to select which parts of the results are fed further into the results. This is how the model is capable of finding parts of the text that are "relevant". The problem is that this just isn't enough to robustly identify spam or incorrect information.

However, we can isolate this "find the relevant bit" functionality away from the rest of the LLM to enhance regular search engines. It's hard to say how useful this is; Google has intentionally damaged their search engine and it may simply not be worth the GPU cycles compared to traditional approaches, but it's an idea being widely explored right now.


The only thing that can solve the misinformation from a bad LLM is the misinformation from a good LLM... with a gun.


>The way around that is that is for LLM-based tools to run a regular search engine query in the background and feed the results of that in alongside the prompt.

Hardly better, as soon those "search engine results" would be AI slop themselves, including actual published papers (phoned-in by using AI, and "peer reviewed" by using AI from indifferent reviewers)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: