Hacker News new | past | comments | ask | show | jobs | submit login

While I agree with the sentiment in general, I've came to the conclusion that what I really want is the flexibility of the natural language interface that LLM's provide, and the return of the correct document. No 'reasoning', no summarizing, just better search [0].

The issue with the current generation of models is that they can't reason, they may do very well at pretending to reason, but they can't [1]. Reasoning requires the ability to identify and reuse patterns, and while there has been some advancement in this area [2] with getting models to learn the underlying pattern and rule, it doesn't generalize. This results in models that will happily tell you that a statement is both true and false, and be unable to identify the logical problem with that.

Even creating summaries is difficult, and LLM's are more than happy to hallucinate even when summarizing documents, providing incorrect, or entirely made up facts [3]. The general workaround is multiple runs with the same work and averaging the response, but that's a lot of work, and energy.

[0] https://win-vector.com/2024/05/21/i-want-flexible-queries-no...

[1] https://medium.com/@konstantine_45825/gpt-4-cant-reason-2eab...

[2] https://arxiv.org/abs/2405.15071

[3] https://community.openai.com/t/gpt-4o-hallucinating-at-temp-...




I have an experiment that you can reproduce: use a search API (e.g., Brave or Duckduckgo) and for each result, ask a local LLM to rate it as useful or not useful, then I fetch the entire web pages of ‘useful’ results and ask for summaries made considering the original search query. I like to look at these summaries, and I take one more pass asking for the concatenated summaries to be summarized as a group, asking for concise and de-duplicated final summary.

Anyway, I enjoy playing with this and because I am using my own little Python scripts, I can switch models and hack in it easily.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: