Hacker News new | past | comments | ask | show | jobs | submit login

How do AI startups get away with copyright violations? To train AI model they need to download copyrighted works (images, videos, music) into their AI cloud, thus creating a "copy" under US Copyright law. Isn't this outright illegal?

There are even datasets, collections of URLs like "common crawl". You cannot legally download them and use without breaking the law.

They cannot get away with "fair use" because they are harming creators income by making generative AI using their works.

But as there are big money I guess the government will make some exception for them.




This is still an open matter of law.

The core issue is the transformative test in fair use. Is the model sufficiently transformative?

The question of what, if any, impact OpenAI has on the copyright holder's income is yet to be demonstrated in court.

So far, the lawsuits launched by the authors haven't gotten very far. https://www.theguardian.com/books/2024/feb/14/two-openai-boo... https://casetext.com/case/tremblay-v-openai-inc-6

> However, the UCL claim does not lack factual allegations; it lacks a tenable legal theory. See Brown v. Van s Int'l Foods, Inc., No. 22-CV-00001-WHO, 2022 WL 1471454, at *6 (N.D. Cal. May 10, 2022) (“As the defect lies in the legal theory, not the factual allegations, the dismissal is without leave to amend.”). The Court dismisses the UCL claim without leave to amend as amendment would be futile.

> Tremblay v. OpenAI, Inc., 23-cv-03223-AMO, 5 (N.D. Cal. Jul. 30, 2024)

This is only partially dismissed - the unfair competition claim is still open and hasn't been ruled on and is still active and ongoing - https://www.courtlistener.com/docket/67538258/tremblay-v-ope...


But when they download images, music or videos from Internet they are making non-transformative copy.


I would suggest you argue with the judge that downloading something is making a non-transformative copy of the material.

I believe that, however, is irrelevant to the transformative nature of the end product - the model itself.

I would also encourage you tread Perfect 10 v Google and the appeal ( https://arstechnica.com/tech-policy/2007/05/google-v-perfect... https://www.eff.org/cases/perfect-10-v-google https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com.... )

From Wikipedia:

> The Ninth Circuit did, however, overturn the district court's decision that Google's thumbnail images were unauthorized and infringing copies of Perfect 10's original images. Google's claimed that these images constituted fair use, and the circuit court agreed. This was because they were "highly transformative." The court did not define what size a thumbnail should be but the examples the court cited was only 3% of the size of the original images. Most other major sites use a size not longer than 150 pixels on the long side. Specifically, the court ruled that Google transformed the images from a use of entertainment and artistic expression to one of retrieving information, citing the precedent Kelly v. Arriba Soft Corporation. The court reached this conclusion despite the fact that Perfect 10 was attempting to market thumbnail images for cell phones, with the court quipping that the "potential harm to Perfect 10's market remains hypothetical."

> The court pointed out that Google made available to the public the new and highly beneficial function of "improving access to [pictorial] information on the Internet." This had the effect of recognizing that "search engine technology provides an astoundingly valuable public benefit, which should not be jeopardized just because it might be used in a way that could affect somebody's sales.

If resizing an image to a fraction of the size of the original is sufficiently transformative and useful for a different thing (image search rather than selling thumbnails for cellphone porn) is considered fair use, then direct parallels could be drawn from that ruling to OpenAI's use of copyrighted material being sufficiently transformative and irrespective of someone selling summaries of a copyrighted work.

---

If you believe that OpenAI and other LLMs are infringing and not covered by fair use, it would be helpful if you could write a bit on how they fail at the four tests of fair use described in https://fairuse.stanford.edu/overview/fair-use/four-factors/


The common belief is that copyright covers redistribution, not literal copying. Otherwise, computers and the Internet would not work.


Because they are not distributing the work, which is how the law currently works. And in case you’re not aware, there are many lawsuits filed that suppose this exact claim: that models are not transformative and are thus breaking copyright law.


It doesn't matter; before training they download the content for training from Internet and thus make a "copy" which might be illegal.


This would break all websites. You can't check a license before loading the page.

Generally speaking, when you request a site, barring a compelling reason to believe otherwise, the assumption must be that accessing the content is legal.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: