This feels less like a failure of rule-following and more like a limit of language systems that are always optimized to emit tokens. The model can recognize a constraint boundary, but it doesn’t really have a way to treat not responding as a valid outcome. Once generation is the only move available, breaking the rules becomes the path of least resistance.