> "resist the temptation to get better ratings from gullible humans by hallucina...

> "resist the temptation to get better ratings from gullible humans by hallucinating citations or faking task completion"

Everything this from this point on is pure fiction. An LLM can't get tempted or resist temptations, at best there's some local minimum in a gradient that it falls into. As opaque and black-box-y as they are, they're still deterministic machines. Anthropomorphisation tells you nothing useful about the computer, only the user.