They're consistent to the model, particularly if you ask the model to rationaliz...

sehro · on April 28, 2024

If the model can properly and consistently recognize hallucinations, why does it return said hallucinations in the first place?

CuriouslyC · on April 28, 2024

Models can get caught by what they start to say early. So if they model goes down a path that seems like a likely answer early on, and that ends up being a false lead or dead end, they will end up making up something plausible sounding to try and finish that line of thought even if it's wrong. This is why chain of thought and other "pre-answer" techniques improve results.

Because of the way transformers work, they have very good hindsight, so they can realize that they've just said things that are incorrect much more often than they can avoid saying incorrect things.