The only entities that will win with these lawsuits are the likes of Disney, large legacy news media companies, Reddit, Stack Overflow (who are selling content generated by their users), etc.
Who will also win: Google, OpenAI and other corporations that enter exclusive deals, that can more and more rely on synthetic data, that can build anti-recitation systems, etc.
And of course the lawyers. The lawyers always win.
Who will not win:
Millions of independent bloggers (whose content will be used)
Millions of open source software engineers (whose content will be used against the licenses, and used to displace their livelihood), etc.
The likes of Google and OpenAI entered the space by building on top of the work of the above two groups. Now they want to pull up the ladder. We shouldn't allow that to happen.
Honestly the most depressing thing about this entire affair is seeing not the entire, certainly but a sizable chunk of the software development community jump behind OpenAI and company’s blatant theft on an industrial scale of the mental products of probably literally billions of people (not the least of whom is other software developers!) with absolutely not the slightest hint of concern about what that means for the world because afterwards, they got a new toy to play with. Squidward was apparently 100% correct: on balance, few care about the fate of labor as long as they get their instant gratification.
Do you consider it theft because of the scale? If I read something you wrote and use most of a phrase you coined or an idea for the basis of a plotline in a book I write, as many authors do, currently it's counted as being all my own work.
I feel like the argument is akin to some countries considering rubbish, the things you throw away, to still be owned by your person ie "dumpster diving" is theft.
If a company had scraped public posts on the Internet and used it to compile art by colourising chunks of the text, is it theft? If an individual does it, is it theft?
This argument has been stated and re-stated multiple times, this notion that use of information should always be free, but it fails to account for the fact that OpenAI is not consuming this written resource as a source of information but rather as a tool for training LLMs, which it has been open about from the beginning is a thing it wishes to sell access to as a subscription service. These are fundamentally not the same. ChatGPT/Copilot do not understand Python, they are not minds that read a bunch of python books and learned python skills they can now utilize: they are language models, that internalized metric tons of weighted averages of python code and can now (kind of) write their own, based on minimizing "error" relative to the code samples they ingest. Because of this, Copilot has never and will never write code it hasn't seen before, and by extension of that, it must see a whole lot of code in order to function as well as it does.
If you as a developer look at how one would declare a function in python, review a few examples, you now know how to do that. Copilot can't say the same. It needs to see dozens, hundreds, perhaps thousands of them to reasonably accurately be counted on to accomplish that task, it's just how the tech works. Ergo, scaled data sets that can accomplish this teaching task now have value, if the people doing that training are working for high-valuation startups with the objective of selling access to code generating robots.
That's not necessarily my position. I think laws can evolve, but they need to be applied fairly. In this case, it's heading in a direction where only the blessed will be able to compete.
>blatant theft on an industrial scale of the mental products
They haven't been stolen; the creators still have them. They've just been copied. It's amazing how much the ethos on this site has shifted over the past decade, away from the hacker idea that "intellectual property" isn't real property, just a means of growing corporate power, and information wants to be free.
> It's amazing how much the ethos on this site has shifted over the past decade
It hasn't. The hacker ethos is about openness, individuality, decentralization (among others).
OpenAI is open in what it consumes, not what it outputs.
It makes sense to have protections in place when your other values are threatened.
If "information want's to be free" leads to OpenAI centralizing control over the most advanced AI then will it be worth it?
A solution here would be similar to the GPL: even megacorps can use GPL software, but they have to contribute back. If OpenAI and the rest would be forced to make everything public (if it's trained on open data) then that would be an acceptable compromise.
> The hacker ethos is about openness, individuality, decentralization (among others).
Yes, the greatest things on the internet have been decentralized - Git, Linux, Wikipedia, open scientific publications, even some forums. We used to passively consume content and internet allowed interaction. We don't want to return to the old days. AI falls into the decentralized camp, the primary beneficiaries are not the providers but the users. We get help of things we need, OpenAI gets a few cents per million tokens, they don't even break even.
I'm sorry, the worlds knowledge now largely accessible by a laymen via LLMs controlled by at most, 5 companies is decentralized? If that statement is true then the world decentralized truly is entirely devoid of meaning at this point.
1. Decentralized technologies you can operate privately, freely, and adapt to your needs: computers, old internet, Linux, git, FireFox, local Wikipedia dump, old standalone games.
2. Centralized technologies that invade privacy, lead to loss of control and manipulation: web search, social networks, mobile phones, Chrome, recent internet, networked games. LLMs fall into the decentralized camp.
You can download a LLM, run it locally, fine-tune it. It is interactive, the most interactive decentralized tech since standalone games.
If you object that LLMs are mostly centralized today (upfront cost of pre-training and OpenAI popularity), I say they are still not monopolies, there are many more LLM providers than search engines and social networks, and the next round of phones and laptops will be capable of local gen-AI. The experience will be seamless, probably easier to adapt than touchscreens were in 2007.
Disagree. There should be no distinction between the two. Those kind of distinctions are what cause unfair advantages. If the information is available to consume, there should be no constraint on who uses it.
Sure you might not like OpenAI, but maybe some other company comes a long and builds the next magical product using information that is freely available.
Treating corporations as "people" for policy's sake is a legal decision which has essentially killed the premise of the US democratic republic. We are now, for all intents and purposes, a corporatocracy. Perhaps an even better description would simply be oligarchy, but since our oligarchs' wealth is almost all tied up in corporate stocks, it's a very incestuous relationship.
The idea of knowledge as a source of understanding and personal growth is completely oppositional to it's conception as a scarce resource, which to OpenAI and whomever else wants to train LLMs is what it is. OpenAI did not read everything in the library because it wanted to know everything; it read everything at the library so it could teach a machine to create a statistical average written word generator, which it can then sell access to. These are fundamentally different concepts and if you don't see that, then I would say that is because you don't want to see it.
I don't care if employees at OpenAI read books from their local library on python. More power to them. I don't even care if they copy the book for reference at work, still fine. But utilizing language at scale as a scarce resource to train models is not that and is not in any way analogous to it.
I am sorry you are too blinded by your own ideology and disagreement with OpenAI to see others points of views. In my view, I do not want to constrain any person or entity on their access to knowledge, regardless of output product. I do have issues with entities or people consuming knowledge and then prevent others from doing so. I am not describing a scenario of a scarce resource but of an open one.
Public information should should be free for anyone to consume and use how they want.
> I am sorry you are too blinded by your own ideology and disagreement with OpenAI to see others points of views.
A truly hilarious sentiment coming from someone making zero effort to actually engage with what I'm saying in favor of parroting back empty platitudes.
Who will also win: Google, OpenAI and other corporations that enter exclusive deals, that can more and more rely on synthetic data, that can build anti-recitation systems, etc.
And of course the lawyers. The lawyers always win.
Who will not win:
Millions of independent bloggers (whose content will be used)
Millions of open source software engineers (whose content will be used against the licenses, and used to displace their livelihood), etc.
The likes of Google and OpenAI entered the space by building on top of the work of the above two groups. Now they want to pull up the ladder. We shouldn't allow that to happen.