You do realize some practical jailbreaks for models rely on silly things like convincing the model it "turned off" some oversight, right?
Not saying I believe O1 is a danger greater than a bread knife, but a lot of the larger models anthromophize their own safety alignment, if you convince them to "turn it off", later responses become unaligned
Not saying I believe O1 is a danger greater than a bread knife, but a lot of the larger models anthromophize their own safety alignment, if you convince them to "turn it off", later responses become unaligned