>if a work is purely derivative of a source work This is the weakest part of the...

dchichkov · on Dec 29, 2023

"Transformative" seems to fit a lot more that "Derivative".

On the other hand, it's understandable why NYT is worried. OpenAI itself says that occupations like: Writers and Authors, Web and Digital Interface Designers, News Analysts, Reporters, and Journalists, Proofreaders and Copy Markers are "90-100% exposed" to what OpenAI is building.

paledot · on Dec 29, 2023

We should all be worried about that. If journalism is replaced with AI, truth is replaced with the AI hallucination du jour.

chucke1992 · on Dec 30, 2023

Most of the modern news are hallucinations and post-truths.

_bkyr · on Dec 30, 2023

https://www.foxnews.com

https://www.nytimes.com

https://www.cnn.com

https://www.washingtonpost.com

https://www.reuters.com

https://apnews.com

hatenberg · on Dec 29, 2023

It’s already done. it’s not the future.

hughesjj · on Dec 29, 2023

Yup. News has been rampant with speculation, hearsay, and propaganda all my life。 Content mills and astroturfers already bury the truth or relevant stories with noise.

chucke1992 · on Dec 30, 2023

I don't buy into all these "dangers". The advent of cars did not decrease the amount of drivers and introduced various new jobs, that were not available for a lot of people. And the rise of computers, did not make the workforce smaller but instead opened many more opportunities for a lot of people.

johnnyanmac · on Jan 4, 2024

I think focusing on lost jobs is the wrong angle to take this in. It's in how the content being used is being compensated. OpenAi isn't paying writers to train their engine.

I don't care that the car replaced the horse carriage because it didn't need to compensate horses nor handlers to do so. AI being the newest iteration of scraping data from artists, writers, etc. to profit millions off of is directly using the "horse handler's" work. If these LLM's threw NYT a royalty to use their articles as training material, there wouldn't be a lawsuit.

avidiax · on Dec 28, 2023

A question is whether the new model still intrinsically embeds the source text, but this is later filtered in the output, or if it no longer embeds the text at all.

The latter is more defensible.

skygazer · on Dec 29, 2023

I would think an existing model could bootstrap a copyright free training corpus by completely rewriting/paraphrasing copyrighted material with semantic fidelity for training of the next model to completely eliminate memorization of copyrighted works. That might pose an interesting obstacle to copyright challenges, bootstrapping your way into a clean room. Although, tweaking the architecture to either eliminate memorization, or eliminate high fidelity reproduction of verbatim training data seems far more expedient and less costly.