Adding an extra constraint of no copying verbatim from a very large and relevant...

pants2 · on Dec 28, 2023

What if OpenAI were to first summarize or transform the content before training on it? Then the LLM has never actually seen copyrighted content and couldn't produce an exact copy.

bertil · on Dec 29, 2023

You are assuming a lossy compression. Stylistic guidelines and personal habits of beat journalists suggest you might not, depending on how detailed the LLM is. The complaint has many quotes that are long verbatim sections.