Hacker News new | past | comments | ask | show | jobs | submit login

They're consistent to the model, particularly if you ask the model to rationalize its rating. You will get plenty of hallucinated answers that the model can recognize as hallucinations and give a low rating to in the same response.



If the model can properly and consistently recognize hallucinations, why does it return said hallucinations in the first place?


Models can get caught by what they start to say early. So if they model goes down a path that seems like a likely answer early on, and that ends up being a false lead or dead end, they will end up making up something plausible sounding to try and finish that line of thought even if it's wrong. This is why chain of thought and other "pre-answer" techniques improve results.

Because of the way transformers work, they have very good hindsight, so they can realize that they've just said things that are incorrect much more often than they can avoid saying incorrect things.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: