Ensuring a model never outputs copyrighted content is unimportant and tangential. It's irrelevant. You don't look for a way to make humans output no copyrighted content, you address each time they do case by case.
A model training being rendered fair use doesn't mean any of its output can be used for whatever regardless.
I think when GP says "address each time case by case", they mean "you sue them when they infringe", instead of "this human has an illegal brain because it remembers Taylor Swift's songs".
PS: your "#1" is really hard to do and I'd guess it is infeasible. Even Google (esp. Youtube) with their vast data capabilities, often gets it wrong.
A model training being rendered fair use doesn't mean any of its output can be used for whatever regardless.