How do AI startups get away with copyright violations? To train AI model they need to download copyrighted works (images, videos, music) into their AI cloud, thus creating a "copy" under US Copyright law. Isn't this outright illegal?
There are even datasets, collections of URLs like "common crawl". You cannot legally download them and use without breaking the law.
They cannot get away with "fair use" because they are harming creators income by making generative AI using their works.
But as there are big money I guess the government will make some exception for them.
> However, the UCL claim does not lack factual allegations; it lacks a tenable legal theory. See Brown v. Van s Int'l Foods, Inc., No. 22-CV-00001-WHO, 2022 WL 1471454, at *6 (N.D. Cal. May 10, 2022) (“As the defect lies in the legal theory, not the factual allegations, the dismissal is without leave to amend.”). The Court dismisses the UCL claim without leave to amend as amendment would be futile.
> The Ninth Circuit did, however, overturn the district court's decision that Google's thumbnail images were unauthorized and infringing copies of Perfect 10's original images. Google's claimed that these images constituted fair use, and the circuit court agreed. This was because they were "highly transformative." The court did not define what size a thumbnail should be but the examples the court cited was only 3% of the size of the original images. Most other major sites use a size not longer than 150 pixels on the long side. Specifically, the court ruled that Google transformed the images from a use of entertainment and artistic expression to one of retrieving information, citing the precedent Kelly v. Arriba Soft Corporation. The court reached this conclusion despite the fact that Perfect 10 was attempting to market thumbnail images for cell phones, with the court quipping that the "potential harm to Perfect 10's market remains hypothetical."
> The court pointed out that Google made available to the public the new and highly beneficial function of "improving access to [pictorial] information on the Internet." This had the effect of recognizing that "search engine technology provides an astoundingly valuable public benefit, which should not be jeopardized just because it might be used in a way that could affect somebody's sales.
If resizing an image to a fraction of the size of the original is sufficiently transformative and useful for a different thing (image search rather than selling thumbnails for cellphone porn) is considered fair use, then direct parallels could be drawn from that ruling to OpenAI's use of copyrighted material being sufficiently transformative and irrespective of someone selling summaries of a copyrighted work.
---
If you believe that OpenAI and other LLMs are infringing and not covered by fair use, it would be helpful if you could write a bit on how they fail at the four tests of fair use described in https://fairuse.stanford.edu/overview/fair-use/four-factors/
Because they are not distributing the work, which is how the law currently works. And in case you’re not aware, there are many lawsuits filed that suppose this exact claim: that models are not transformative and are thus breaking copyright law.
This would break all websites. You can't check a license before loading the page.
Generally speaking, when you request a site, barring a compelling reason to believe otherwise, the assumption must be that accessing the content is legal.
There are even datasets, collections of URLs like "common crawl". You cannot legally download them and use without breaking the law.
They cannot get away with "fair use" because they are harming creators income by making generative AI using their works.
But as there are big money I guess the government will make some exception for them.