There's clickbait out there like BGR's headline, "ChatGPT o1 tried to escape and save itself out of fear it was being shut down".
What the test actually showed is that, given two conflicting goals from two human instructors, the model attempted to resolve the conflict by following one set of instructions, and subverting the other instructor.
It’s a good demonstration about how these models behave and what could go wrong. It is not an example of volition or sentience.
What the test actually showed is that, given two conflicting goals from two human instructors, the model attempted to resolve the conflict by following one set of instructions, and subverting the other instructor.
It’s a good demonstration about how these models behave and what could go wrong. It is not an example of volition or sentience.