Well yeah, copying a work and using it for its original expressive purpose isn’t fair use, no? You have to use it for a transformative purpose.
Suppose I’m selling subscriptions to the New Jersey Times, a site which simply downloads New York Times articles and passes them through an autoencoder with some random noise. It serves the exact same purpose as the New York Times website, except I make the money. Is that fair use?
If they could find a single person who in natural use (e.g. not as they were trying to gather data for this lawsuit) has ever actually used ChatGPT as a direct substitution for a NYT subscription, I'd support this lawsuit.
But nobody would do that, because ChatGPT is a really shitty way to read NYT articles (it's stale, it can't reliably reproduce them, etc.). All that is valuable about it is the way that it transforms and operates on that data in conjunction with all the other data that it has.
The real world use of ChatGPT is very transformative, even if you can trick it into behaving in ways that are not. If the courts act intelligently they should at least weigh that as part of their decision.
It’s more of a thought experiment. Here’s another with more commercial applications:
Suppose I start a service called “EastlawAI” by downloading the Westlaw database and hiring a team of comedians to write very funny lawyer jokes.
I take Westlaw cases and lawyer jokes and feed them to my autoencoder. I also learn a mapping from user queries to decoder inputs.
I sell an API and advertise it to startups as capable of answering any legal question in a funny way. Another company comes along with an API to make the output less funny.
Have I created a competitor to Westlaw by copying Westlaw’s works for their original expressive purpose and exposing it as an intermediary? Or have I simply trained the world’s most informative lawyer joke generator that some of my customers happen to use for legal analysis by layering other tools atop my output?
Did I need to download Westlaw cases to make my lawyer joke generator? Are the jokes a fair-use smokescreen for repackaging commercially valuable copyrighted data? Does my joke generator impact Westlaw in the market? Depends, right?
That’s nonsense piracy. I never intend to own a truck, so when I need to haul a little something I go to Home Depot and steal a Ford off the lot for an hour? What if I stole all your commits, plucked the hard lines out of the ceremony, and then launched an equivalent feature the same week as you did, but for a competing software company? Would you or your employer deserve to get paid for my use of the slice of your work that was specifically useful for me? Yeah, and then some extra for theft.
> Well yeah, copying a work and using it for its original expressive purpose isn’t fair use, no? You have to use it for a transformative purpose.
To be clear, whether the use of the original work is transformative is one key consideration within one of the four prongs of fair use. The prong "purpose and character of the use" can be fulfilled by other conditions [1]. For example, using the original work within a classroom for education purposes is not transformative, but can fulfill the same "purpose and character of the use" prong. Whether the use is for profit and to which extent are other considerations within that prong. A profit purpose doesn't automatically fail the purpose prong, and a non-profit purpose doesn't automatically pass the purpose prong.
> Well yeah, copying a work and using it for its original expressive purpose isn’t fair use, no? You have to use it for a transformative purpose.
They transformed the weights.
Just like reading the article transforms yours.
As for verbatim reproduction, I'm pretty sure brains are capable of reproducing song lyrics, musical melodies, common symbols ("cool S"), and lots of other things verbatim too.
Those quotes from Dr. King's speech that you remember are copyrighted, you know?
This comment is just blatant anthropomorphizing of ML models. You have no idea if reading an article “transforms weights” in a human mind, and regardless, they aren’t legally the same thing anyway.
Why? A human being isn’t infinitely scalable; they’re just different. It’s the same thing as going to a movie theatre to watch a movie vs. recording it with a camera.
A human churning butter, spinning cotton, or acting as a bank teller isn't infinitely scalable either. This is orthogonal to the point.
Times change. We're industrializing information creation and consumption (the latter is mostly here already), and we can't be stuck in the old copyright regime. It'll be useless in very short order.
All this road bump will do will give the giant megacorps time to ink deals, solidify their lead, and trounce open source. Twenty years on, the pace of content creation will be as rapid as thought itself and we'll kick ourselves for cementing their lead.
This is a transitional period between two wildly different worlds.
Suppose I’m selling subscriptions to the New Jersey Times, a site which simply downloads New York Times articles and passes them through an autoencoder with some random noise. It serves the exact same purpose as the New York Times website, except I make the money. Is that fair use?