AI generated content can be either derivative or transformative. For example if I use AI to paraphrase books or articles, that would be a derivative use.
But an AI that searches the web, news, and scientific papers for references and then outputs a Wikipedia-style article on a given topic would be a transformative use because it does a lot of work synthesizing multiple sources into a coherent piece, and only uses limited factual references.
Or we can do something more advanced. We solve a task with a student model and in parallel we solve the same task with a teacher model empowered with RAG, web search and code execution tools. Then you have two answers. You use an examination prompt to extract what the student model got wrong as a new training example.
That would be transformative, and targeted. No need to risk collecting content that is already known by the student model. It would be more like "machine studying" or "machine teaching" because it creates targeted lessons for the student.
But an AI that searches the web, news, and scientific papers for references and then outputs a Wikipedia-style article on a given topic would be a transformative use because it does a lot of work synthesizing multiple sources into a coherent piece, and only uses limited factual references.
Or we can do something more advanced. We solve a task with a student model and in parallel we solve the same task with a teacher model empowered with RAG, web search and code execution tools. Then you have two answers. You use an examination prompt to extract what the student model got wrong as a new training example.
That would be transformative, and targeted. No need to risk collecting content that is already known by the student model. It would be more like "machine studying" or "machine teaching" because it creates targeted lessons for the student.