In Sony vs. Universal case, Sony is the producer of a tool where the consumer uses to "time-shift" a broadcast that they legally are allowed to view. Similarly, you can rip your own CDs or photocopy your own books. This case never made reselling those content legal. OpenAI does not train ChatGPT on the content you own - they do it on some undisclosed amount of data that you may or may not have a legal right to access, and then move on and (is shown to) reproduce it nearly verbatim - they may even charge you for the pleasure.
So presumably when they fix that issue (which, if the text matches exactly, should be trivially easy) then would you accept that as a sufficient remedy?
Copyright infringement is not avoided by changing some text so it isn’t an exact clone of the source.
Determining whether a work violates a copyright requires holistic consideration of the similarity of the work to the copyrighted material, the purpose of the work, and the work’s impact on the copyright holder.
There is not an algorithm for this, cases are decided on by people.
There are algorithms that could detect obvious violations of copyright, such as the one you suggest which looks for exact matches to copyrighted material. However, there are many potential outputs, or patterns of output, which would be copyright violation and would not be caught by this trivial test.
I certainly don't think it's impossible, but I think it is hard problem that won't be solved in the immediate future, and creators of data used for training are right to seek to stop wide availability of LLMs that regurgitate information they worked hard to obtain.
I think it will be a bit easier than you believe. The reason why it hasn’t been done yet is that there hasn’t been a compelling economic reason to do so.
Train a model on NYT text that outputs a summary of facts that it learned: OMG literally murder.