Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is mentioned later in the article but I think it's important to clearly draw a distinction between cases where a) The "offender" is using the licensed work within the letter of the license but not the spirit b) The "offender" has broken both the letter and spirit of the license.

I've licensed multiple repositories under MIT, written under CC-BY, and published games under ORC. All of those licenses require attribution, something that AI, for example, explicitly ignores. In those situations "Wait, no, not like that" isn't "I didn't expect you'd use it this way" it's "you weren't authorized to use it this way."



Listing all of the creators from whose attribution-licensed works an LLM (potentially) derived an output would seem to satisfy the letter of such licenses, but it is not clear that such would satisfy the spirit (which seems to assume a stronger causal link and a more limited number of attributions). If creators can be grouped outside of the creator naming explicitly associated with the works, this could degrade into "this work is derived from the works of humanity"; however, listing all human beings individually does not seem _meaningfully_ different and seems to satisfy the attribution requirement of such licenses.

From what little I understand of LLMs, the weight network developed by training on a large collection of inputs is similar to human knowledge in that some things will be clearly derived (at least in part) from a limited number of inputs but others might have no clear contributor density. If I wrote a human "superiority" science fiction story, I could be fairly confident that Timothy Zahn and Gordon R. Dickson "contributed"; however, this contribution would not be considered enough to violate copyright and require licensing. Some LLM outputs clearly violate copyright (e.g., near verbatim quotation of significant length), but other outputs seem to be more broadly derived.

If the law treats LLMs like humans ("fairly"), then broad derivation would not seem to violate copyright. This seems to lead toward "AI rights". I cannot imagine how concepts of just compensation and healthy/safe working conditions would apply to an AI. Can a corporation own a computer system than embodies an AI or is that slavery?

If the law makes special exceptions for LLMs, e.g., adjusting copyright such that fair use and natural learning only apply to human persons, then licensing would be required for training. However, attribution licenses seem to have the above-mentioned loophole. (That this loophole is not exploited may be laziness or concern about admitting that following the license is required — which makes less openly licensed/unlicensed works poisonous.)

If the purpose of copyright is to "promote the useful arts", then the law should reflect that purpose. Demotivating human creators and the sharing of their works seems destructive to that purpose, but LLMs are also enabling some creativity. Law should also incorporate general concepts such as equality under the law. LLMs also seem to have the potential for power concentration, which is also a concern for just laws.

Perhaps I am insufficiently educated on the tradeoffs to see an obvious solution, but this seems to me like a difficult problem.


From Holub on Patterns (2004), patterns are discovered, not invented. The implementation of patterns is the idiom, which may or may not be idiomatic based on a given community of practice.

If it can be shown multiple people independently created something, the artifact is not the pattern itself--but the pattern is recognized because of so many similar implementations.

LLMs create (or re-create) and derive idioms of things, based on weights which are idioms themselves (probabilistic patterns). Then we can only say AI may understand patterns of color theory, or idiomatic execution (art style)--but that is all.

---

1. They are willful, purposeful creatures who possess selves.

2. They interpret their behavior and act on the basis of their interpretations.

3. They interpret their own self-images.

4. They interpret the behavior of others to obtain a view of themselves, others, and objects.

5. They are capable of initiating behavior so as to affect the view of others have of them and that they have of themselves.

6. They are capable of initiating behavior to affect the behavior of others toward them.

7. Any meaning that children attach to themselves, others, and objects varies with respect to the physical, social, and temporal settings in which they find themselves.

8. Children can move from one social world to another and act appropriately in each world.

-- The Private Worlds of Dying Children (1978)


I guess like humans LLMs should cite their main sources? But not everything they once read.


Big tech has no respect for licenses, or the law itself with Uber and the like.

People talking about licenses like they have some courtroom legitimacy is hilarious. Licenses are like patents, they are weapons of the large corporation to be used against other large corporations or the people.

Of course you can try to litigate for ten years against a big corporation with lawyers on retainer, good luck. Might even get an "Erin Brockovich" movie script out of it, but we're seeing the rule of law and legitimacy of the courts degrade rapidly and become increasingly corrupt, once over the years, but now by the day


Think of it in the positive way. They also don't attribute all rights reserved works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: