> "if your business value is your codebase, it's hard to build a business whilst literally giving it away".
perhaps then it comes as no surprise that some very outspoken open-source proponents do not open-source their core business components. I can understand they do it in order to exist as a company, as busineses, but I don't understand why they have to endure being shamed for staying closed-source, while all their stack is open. many such companies exist.
and let's add to this all the fact that everything released in 2025 as open-source gets slurped by LLMs for training. so, essentially, you feed bro's models breakfast with your open-source, like it or not. in the very near future we'll be able to perhaps prompt a 'retell' of Redis which is both the same software, but written so that it is not.
in essence there seems to be now little reason to open-source anything at all, and particularly if this is a core-business logic. open-source if you can outpace everybody else (with tech), or else you shouldn't care about the revenue.
A sufficiently capable LLM might be good enough to do cleanroom design on its own, with little to no human assistance. That would destroy the entire idea of copyright as it exists for software.
You need one agent that can write a complete specification of any piece of software, either just by using it and inferring how it works, or by reverse engineering if not prohibited by the license. You then have a lawyer in the middle (human or LLM) review it, removing any bits that are copyrighted. You then need another agent that can re-implement that spec. You just made a perfectly legal clone.
Cleanroom design is a well-established precedent in the US, and has been used before, just with teams of humans instead of LLMs.
I think some companies will be completely unaffected by this, as either the behavior of their code can't easily be infered just from API calls, or because their value actually lies in users / data / business relationships, not the code directly. Stripe would be my go-to example, you can't just reverse-engineer all their optimizations and antifraud models just by getting a developer API key and calling their API endpoints. They also have a lot of relationships with banks and other institutions, which are arguably just as important to their business as their code. Instagram, Uber or Amazon also fall into this bucket somewhat.
Because, unlike humans, LLMs reliably reproduce exact excerpts from their training data. It's very easy to get image generation models to spit out screenshots from movies.
> Can we start at "humans are not computers", maybe?
Sure. So it stands to reason that "computers" are not bound by human laws. So an LLM that finds a piece of copyright data out there on the internet, downloads it, and republishes it has not broken any law? It certainly can't be prosecuted.
My original point was that copyright protections are about (amongst other things) protecting distribution and derivative works rights. I'm not seeing a coherent argument that feeding a copyrighted work (that you obtained legally) into a machine is breaching anyone's copyright.
> So an LLM that finds a piece of copyright data out there on the internet, downloads it, and republishes it has not broken any law?
Are you even trying? A gun that kills a person has not broken any law? It certainly can't be prosecuted.
> I'm not seeing a coherent argument that feeding a copyrighted work (that you obtained legally) into a machine is breaching anyone's copyright.
So you don't see how having an automated blackbox that takes copyrighted material as an input and provides a competing alternative that can't be proven to come from the input goes against the idea of copyright protections?
> So you don't see how having an automated blackbox that takes copyrighted material as an input and provides a competing alternative that can't be proven to come from the input goes against the idea of copyright protections?
Semantically, this is the same as a human reading all of Tom Clancy and then writing a fast-paced action/war/tension novel.
Is that in breach of copyright?
Copyright protects the expression of an idea. Not the idea.
> Copyright protects the expression of an idea. Not the idea.
Copyright laws were written before LLMs. Because a new technology can completely bypass the law doesn't mean that it is okay.
If I write a novel, I deserve credit for it and I deserve the right to sell it and to prevent somebody else from selling it in their name. If I was allowed to just copy any book and sell it, I could sell it for much cheaper because I didn't spend a year writing it. And the author would be screwed because people would buy my version (cheaper) and would possibly never even hear of the original author (say if my process of copying everything is good enough and I make a "Netflix of stolen books").
Now if I take the book, have it automatically translated by a program and sell it in my name, that's also illegal, right? Even though it may be harder to detect: say I translate a Spanish book to Mandarin, someone would need to realise that I "stole" the Spanish book. But we wouldn't want this to be legal, would we?
An LLM does that in a way that is much harder to detect. In the era of LLMs, if I write a technical blog, nobody will ever see it because they will get the information from the LLM that trained on my blog. If I open source code, nobody will ever see it if they can just ask their LLM to write an entire program that does the same thing. But chances are that the LLM couldn't have done it without having trained on my code. So the LLM is "stealing" my work.
You could say "the solution is to not open source anything", but that's not enough: art (movie, books, paintings, ...) fundamentally has to be shown and can therefore be trained on. LLMs bring us towards a point where open source, source available or proprietary, none of those concepts will matter: if you manage to train your LLM on that code (even proprietary code that was illegally leaked), you'll have essentially stolen it in a way that may be impossible to detect.
How in the world does it sound like it is a desirable future?
Maybe I need to explain it: my point is that the one responsible is the human behind the gun... or behind the LLM. The argument that "an LLM cannot do anything illegal because it is not a human" is nonsense: it is operated by a human.
> I agree with the fact that LLMs are big open-source laundering machines, and that is a problem.
Why do you believe this is a problem? I mean, to believe that you first need to believe that having access to the source code is somehow a problem.
> I mostly see it as a problem for copyleft licences.
Nonsense.
At most, the problem lies in people ignoring what rights a FLOSS license grants to end users, and then feigning surprise when end users use their software just as the FLOSS license intended.
Also a telltale sign is the fact that these blind criticisms single out very precise corporations. Apparently they have absolutely no issue if any other cloud provider sells managed services. They single out AWS but completely ignore the fact that the organization behind ValKey includes the likes of Google, Ericsson, and even Oracle of all things. Somehow only AWS is the problem.
> I mean, to believe that you first need to believe that having access to the source code is somehow a problem.
How in the world did you get there from what I said? Open source code has a licence that says what the copyright owner allows or not. LLMs are laundering machine in the sense that they allow anybody to just ignore licences and copyright in all code (even proprietary code: if you manage to train on the code of Windows without getting caught, you're good).
> At most, the problem lies in people ignoring what rights a FLOSS license grants to end users
Once it's been used to train an LLM, there is no right anymore. The licence, copyright, all that is worthless.
> Also a telltale sign is the fact that these blind criticisms [...]
> LLMs are laundering machine in the sense that they allow anybody to just ignore licences and copyright in all code (...)
No. Having access to the code does that. You only need a single determined engineer to do that. I mean, do you believe that until the inception of LLMs the world was completely unaware of the whole concept of reverse engineering stuff?
> Once it's been used to train an LLM, there is no right anymore.
Nonsense. You do not lose your rights to your work just because someone used a glorified template engine to write something similar. In fact, your whole blend of comment conveys a complete lack of experience using LLMs in coding applications, because all major assistant coding services do enforce copyright filters even when asking questions.
> do you believe that until the inception of LLMs the world was completely unaware of the whole concept of reverse engineering stuff?
The scale makes all the difference! A single determined engineer, in their whole life, cannot remotely read all the code that goes into the training phase. How in the world can you believe it is the same thing?
> Nonsense. You do not lose your rights to your work just because [...]
It is only nonsense if you don't try to understand what I'm saying. What I am saying is that if it is impossible to prove that the LLM was trained with copyrighted material, then the copyright doesn't matter.
But maybe your single determined engineer can reverse engineer any trained LLM and extract the copyright code that was used in the training?
perhaps then it comes as no surprise that some very outspoken open-source proponents do not open-source their core business components. I can understand they do it in order to exist as a company, as busineses, but I don't understand why they have to endure being shamed for staying closed-source, while all their stack is open. many such companies exist.
and let's add to this all the fact that everything released in 2025 as open-source gets slurped by LLMs for training. so, essentially, you feed bro's models breakfast with your open-source, like it or not. in the very near future we'll be able to perhaps prompt a 'retell' of Redis which is both the same software, but written so that it is not.
in essence there seems to be now little reason to open-source anything at all, and particularly if this is a core-business logic. open-source if you can outpace everybody else (with tech), or else you shouldn't care about the revenue.