How do you know GH didn't? Maybe they only included repos with LICENSE.MD files which followed a known permissive licence?
What if a particular piece of code is licensed restrictively, and then (assuming without malice) accidentally included in a piece of software with a permissive license?
What if a particular piece of code is licensed permissively (in a way that allows relicensing, for example), but then included in a software package with a more restrictive licence. How could you tell if the original code is licensed permissively or not?
At what point do Github have to become absolute arbiters of the original authorship of the code in order to determine who is authorised to issue licenses for the code? How would they do so? How could you prove ownership to Github? What consequences could there be if you were unable to prove ownership?
That's before we even get to more nuanced ethical questions like a human learning to code will inevitably learn from reading code, even if the code they read is not permissively licensed. Why then, would an AI learning to code not be allowed to do the same?
The “it’s really hard” argument isn’t a very good argument in my opinion?
If we hold reproductions of a single repository to a certain standard, the same standard should probably apply to mass reproductions. For a single repository, it’s your responsibility to make sure it’s used according to the license.
Are there slightly gray edge cases? Of course, but they’re not -that- grey. If I reproduced part of a book from a source that claimed incorrectly it was released under a permissive license, I would still be liable for that misuse. Especially if I was later made aware of the mistake and didn’t correct it.
If something is prohibitively difficult maybe we should sometimes consider that more work is required to enable the circumstances for it to be a good idea, rather than starting from the position that we should do it and moulding what we consider reasonable around that starting assumption.
If someone uploads something and says 'hey, this is some code, this is the appropriate licence for it', it is their mistake, it is in violation of Github's terms of service, and may even be fraudulent. [0].
I'm also not sure that Copilot is just reproducing code, but that's a separate discussion.
> If I reproduced part of a book from a source that claimed incorrectly it was released under a permissive license, I would still be liable for that misuse. Especially if I was later made aware of the mistake and didn’t correct it.
I don't believe that's correct in the first instance (at least from a criminal perspective). If someone misrepresents to you that they have the right to authorise you to publish something, and it turns out they don't have that right, you did not willingly infringe and are not liable for the infringement from a criminal perspective[1]. From a civil perspective, likely the copyright owner could still claim damages from you if you were unable to reach a settlement. A court would probably determine the damages to award based on real damages (including loss of earnings for the content creator), rather than anything punitive if it's found that
Further, most jurisdictions have exceptions for short extracts of a larger copyrighted work (e.g. quotes from a book), which may apply to Copilot.
This is my own code, I wrote it myself just now. Can I copyright it?
> From a civil perspective, likely the copyright owner could still claim damages from you if you were unable to reach a settlement. A court would probably determine the damages to award based on real damages (including loss of earnings for the content creator), rather than anything punitive if it's found that
There are statutory damages on top of your actual damages. $50k per act of infringement. No reason for the copyright holder to settle for less when it's an open and shut case.
> Further, most jurisdictions have exceptions for short extracts of a larger copyrighted work (e.g. quotes from a book), which may apply to Copilot.
Quotes do not automatically get an exception just because they're taken from a larger work, they might be excepted either because they were de minimis (essentially because they were too short to be copyrightable) or because they were fair use (which is a complex question that takes into account the purpose and context, which Copilot is very unlikely to satisfy because it's not quoting other code for the purpose of saying something about it).
> Where do we draw the line?
Circuit specific; some but not all circuits use the AFC test. It sounds like this code was both long enough and creative/innovative enough to be well on the wrong side of it though.
As I understand it, the complainant may CHOOSE to request the court to levy statutory damages rather than actual damages at any point, but is not entitled to both actual AND statutory (17 U.S. Code § 504)
It also seems to be absolutely capped at 30K per infringement, not 50, and ranges up from $750. It also seems that if the "court finds, that such infringer was not aware and had no reason to believe that his or her acts constituted an infringement of copyright, the court in its discretion may reduce the award of statutory damages to a sum of not less than $200."
I think you are probably right that this specific function is copyrightable though, but taken overall, I think Microsoft's lawyers have probably concluded that they would win any challenge on this. Microsoft have lost court battles before though, so who knows?
Your question can actually be answered legally. I'm not a lawyer so I'm not going to tell you what those answers are, but there are pretty well established mechanisms to determine if a function is trivial enough to warrant being copyrighted (a lot of this was explored in the SCO vs. IBM saga)
If you write some code and release it under the GPL. Then I take your code, integrate it into my project, and release my project with the MIT licence (for example), it may be that Copilot was only trained on my repo (with the MIT licence)
The fault there is not on Github, it's on me. I was the one who incorrectly used your code in a way that does not conform to the terms of your licence to me.
I don't think the fact that Copilot outputs code which seems to be covered under the GPL proves that Github did not only crawl repositories with permissive licences when training Copilot.
What if a particular piece of code is licensed restrictively, and then (assuming without malice) accidentally included in a piece of software with a permissive license?
What if a particular piece of code is licensed permissively (in a way that allows relicensing, for example), but then included in a software package with a more restrictive licence. How could you tell if the original code is licensed permissively or not?
At what point do Github have to become absolute arbiters of the original authorship of the code in order to determine who is authorised to issue licenses for the code? How would they do so? How could you prove ownership to Github? What consequences could there be if you were unable to prove ownership?
That's before we even get to more nuanced ethical questions like a human learning to code will inevitably learn from reading code, even if the code they read is not permissively licensed. Why then, would an AI learning to code not be allowed to do the same?