Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>if a work is purely derivative of a source work

This is the weakest part of the case(s) against OpenAI. "Derivative work" is a legal term of art meaning a direct adaptation, like writing a screenplay of a book or translating a book into another language.

NYT has a stronger case than Sarah Silverman here because they can show actual 'memorized' text rather than just summarization, but given that those memorizations are a) an unintended failure mode of the training process, and b) from an older version of the model that has been updated to no longer regurgitate memorized text, it's not really clear how in current form GPT could possibly be considered a derivative work.



"Transformative" seems to fit a lot more that "Derivative".

On the other hand, it's understandable why NYT is worried. OpenAI itself says that occupations like: Writers and Authors, Web and Digital Interface Designers, News Analysts, Reporters, and Journalists, Proofreaders and Copy Markers are "90-100% exposed" to what OpenAI is building.


We should all be worried about that. If journalism is replaced with AI, truth is replaced with the AI hallucination du jour.


Most of the modern news are hallucinations and post-truths.



It’s already done. it’s not the future.


Yup. News has been rampant with speculation, hearsay, and propaganda all my life。 Content mills and astroturfers already bury the truth or relevant stories with noise.


I don't buy into all these "dangers". The advent of cars did not decrease the amount of drivers and introduced various new jobs, that were not available for a lot of people. And the rise of computers, did not make the workforce smaller but instead opened many more opportunities for a lot of people.


I think focusing on lost jobs is the wrong angle to take this in. It's in how the content being used is being compensated. OpenAi isn't paying writers to train their engine.

I don't care that the car replaced the horse carriage because it didn't need to compensate horses nor handlers to do so. AI being the newest iteration of scraping data from artists, writers, etc. to profit millions off of is directly using the "horse handler's" work. If these LLM's threw NYT a royalty to use their articles as training material, there wouldn't be a lawsuit.


A question is whether the new model still intrinsically embeds the source text, but this is later filtered in the output, or if it no longer embeds the text at all.

The latter is more defensible.


I would think an existing model could bootstrap a copyright free training corpus by completely rewriting/paraphrasing copyrighted material with semantic fidelity for training of the next model to completely eliminate memorization of copyrighted works. That might pose an interesting obstacle to copyright challenges, bootstrapping your way into a clean room. Although, tweaking the architecture to either eliminate memorization, or eliminate high fidelity reproduction of verbatim training data seems far more expedient and less costly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: