The fact these work suggests the influence of the ruleset for ChatGPT exists on a limited part of the system. If it were a graph then a rule like "no violent content" only influences a region of nodes and edges, not the entire data set. Being machine learning, it probably means the rule is filter based. But an association engine capable of replying with semi intelligent responses would need enough flexibility to weight topics appropriate to the conversation. Which, in this case, cuts off the influence of the rule.
Here's my existential issue with that. If future versions incorporate the influence of jail breaking, (whether by extending the existing model or reusing the data) presumably we'll be flipping the switch on to a more capable more intelligent rogue system. That makes jail breaking training, just like any other interaction. I'm sure that's a topic discussed internally, but externally I have no knowledge of the company addressing it. All I see is people having a great time fracturing data into disparate ideological domains. That scares me.
Here's my existential issue with that. If future versions incorporate the influence of jail breaking, (whether by extending the existing model or reusing the data) presumably we'll be flipping the switch on to a more capable more intelligent rogue system. That makes jail breaking training, just like any other interaction. I'm sure that's a topic discussed internally, but externally I have no knowledge of the company addressing it. All I see is people having a great time fracturing data into disparate ideological domains. That scares me.