> When asked to explain rationales, these LLMs are observed to lie frequently.
It's not that they "lie" they can't know. LLM lives in the movie Dark City, some frozen mind formed from other peoples (written) memories. :P The LLM doesn't know itself, it's never even seen itself.
At best it can do is cook up retroactive justifications like you might cook up for the actions of a third party. It can be fun to demonstrate, edit the LLMs own chat output to make it say something dumb and ask why it did and watch it gaslight you. My favorite is when it says it was making a joke to tell if I was paying attention. It certainly won't say "because you edited my output".
Because of the internal complexity, I can't say that what an LLM does and its justifications are entirely uncorrelated. But they're not far from uncorrelated.
The cool thing you can do with an LLM is probe them with counterfactuals. You can't rerun the exact same interview without the garlic breath. That's kind cool, also probably a huge liability since it may well be for any close comparison there is a series of innocuous changes that flip it, even ones suggesting exclusion over protected reasons.
Seems like litigation bait to me, even if we assume the LLM worked extremely fairly and accurately.
It's not that they "lie" they can't know. LLM lives in the movie Dark City, some frozen mind formed from other peoples (written) memories. :P The LLM doesn't know itself, it's never even seen itself.
At best it can do is cook up retroactive justifications like you might cook up for the actions of a third party. It can be fun to demonstrate, edit the LLMs own chat output to make it say something dumb and ask why it did and watch it gaslight you. My favorite is when it says it was making a joke to tell if I was paying attention. It certainly won't say "because you edited my output".
Because of the internal complexity, I can't say that what an LLM does and its justifications are entirely uncorrelated. But they're not far from uncorrelated.
The cool thing you can do with an LLM is probe them with counterfactuals. You can't rerun the exact same interview without the garlic breath. That's kind cool, also probably a huge liability since it may well be for any close comparison there is a series of innocuous changes that flip it, even ones suggesting exclusion over protected reasons.
Seems like litigation bait to me, even if we assume the LLM worked extremely fairly and accurately.