Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When ChatGPT came out, one of the things we learned is that human society generally assumes a strong correlation between intelligence and the ability to string together grammatically correct sentences. As a result, many people assumed that even GPT-3.5 was wildly more "intelligent" than it actually was.

I think Deep Research (and tools like it) offer an even stronger illustration of that same effect. Anything that can produce a well-formatted multiple page report with headings and citations surely must be of PhD-level intelligence, right?

(Clearly not.)



To be fair, OpenAI's the one marketing it as such.


In some ways, it's a good tool to teach yourself to sus out the real clues to reliability, not format and authoritative tone.


But that's the thing. The only way to truly find out if it's reliable (>90%) is to check the data yourself.


This is why metrics and leaderboards like these are so important (but under reported on): https://github.com/vectara/hallucination-leaderboard https://www.kaggle.com/facts-leaderboard

Google Gemni models seem to lead...hopefully the metrics aren't being gamed.


That's been every LLM since GPT-2.


computer use big big words ergo computer rull rull smrt




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: