Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI Search Has a Citation Problem (cjr.org)
25 points by nobody9999 4 months ago | hide | past | favorite | 9 comments


Microsoft Copilot falls down completely when it comes to citing things: even when the answer is correct, the cited documents frequently don't say what it says they do.


If GenAI search results or summarizations can't be verified by citing their essential sources, how can they ever be deemed trustworthy and free of legal liability for fabricating falsehoods? Somehow this Achilles heel must be eliminated or they won't be usable in a great many domains where validation is required (like medical diagnosis, citing legal evidence, investment advice, etc).

In fact, it's hard to imagine all that many uses for GenAI that won't _eventually_ require some sort of calibratable measure for accuracy, and the capacity for validation thereof, before they can be widely adopted.


Even though it’s in project instructions to use verifiable sources, Claude still makes things up for me.

So after it spits something out I paste: “ Are you 100% sure that every story, quote, fact, and source is accurate and verifiable?”

It then fesses up and asks if I want to rewrite with only verifiable things. I say yes, it does, and so far that seems to work!


The only time I've seen extensive citing from an AI is when I use Deep Research with the ChatGPT Pro plan. Otherwise, they are mostly few and far between. Perplexity was one of the early services I used that seemed to try to make an effort to get this right, but I haven't used it in quite some time.


100% my greatest issue with these tools, I really need a "source".


I was going to say Perplexity usually seems good but reading the article it was interesting they are being naughty and using forbidden sources while pretending not to.


This is a feature enabling our content derived species to enter the age of full spectrum story driven history fiction as proposed by parkerblubtan at the third matzmum tecdot convention in 2017


Problem with the LLM is that you need a bucket/hash list for each and every 8/16/32-sized token.

And each URL are quite lengthy.

And a good number of citations can be found hanging off of a token.


I've caught AI outright making up sources (like, fabricating urls and names) to satisfy me in the past.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: