Hacker News

zehaeva · 2025-03-28T14:54:43 1743173683

I thought summarizing papers/stories/emails/meetings was one of the touted use cases of LLMs?

What are the use cases where the expected performance is high?

vonneumannstan · 2025-03-28T15:19:59 1743175199

I didn't notice that example. I doubt top tier models have issues with that. I was more referencing Sabines mentions of hallucinating citations and papers which is an issue I also had 2 years ago but is probably solved by Deep Research at this point. She just has massive skill issues and doesn't know what shes doing.

>What are the use cases where the expected performance is high?

https://openai.com/index/introducing-chatgpt-pro/

o1-pro is probably at top tier human level performance on most small coding tasks and definitely at answering STEM questions. o3 is even better but not released outside of it powering Deep Research.

https://codeforces.com/blog/entry/137543 o3 is top 200 on Codeforces for example.

giantrobot · 2025-03-28T14:53:06 1743173586

> This is just not a use case where the expected performance on these tasks is high.

Yet the hucksters hyping AI are falling all over themselves saying AI can do all this stuff. This is where the centi-billion dollar valuations are coming from. It's been years and these super hyped AIs still suck at basic tasks.

When pre-AI shit Google gave wrong answers it at least linked to the source of the wrong answers. LLMs just output something that looks like a link and calls it a day.

vonneumannstan · 2025-03-28T14:55:57 1743173757

To be fair the newest tools like Deep Research are actually quite good and hallucination is essentially not a real problem for them.

https://marginalrevolution.com/marginalrevolution/2025/02/de...

frm88 · 2025-03-28T17:14:41 1743182081

<<After glowing reviews, I spent $200 to try it out for my research. It hallucinated 8 of 10 references on a couple of different engineeribg topics. For topics that are well established (literature search), it is useful, although o3-mini-high with web search worked even better for me. For truly frontier stuff, it is still a waste of time.>>

<<I've had the hallucination problem too, which renders it less than useful on any complex research project as far as I'm concerned.>>

These quotes are from the link you posted. There are a lot more.

vonneumannstan · 2025-03-28T17:27:29 1743182849

I think Sabine is just wrong in this case. I don't think Deep Research can even hallucinate links in this way at all.