How do you know GH didn't? Maybe they only included repos with LICENSE.MD files ...

IneffablePigeon · on Oct 17, 2022

The “it’s really hard” argument isn’t a very good argument in my opinion?

If we hold reproductions of a single repository to a certain standard, the same standard should probably apply to mass reproductions. For a single repository, it’s your responsibility to make sure it’s used according to the license.

Are there slightly gray edge cases? Of course, but they’re not -that- grey. If I reproduced part of a book from a source that claimed incorrectly it was released under a permissive license, I would still be liable for that misuse. Especially if I was later made aware of the mistake and didn’t correct it.

If something is prohibitively difficult maybe we should sometimes consider that more work is required to enable the circumstances for it to be a good idea, rather than starting from the position that we should do it and moulding what we consider reasonable around that starting assumption.

d1sxeyes · on Oct 17, 2022

If someone uploads something and says 'hey, this is some code, this is the appropriate licence for it', it is their mistake, it is in violation of Github's terms of service, and may even be fraudulent. [0].

I'm also not sure that Copilot is just reproducing code, but that's a separate discussion.

> If I reproduced part of a book from a source that claimed incorrectly it was released under a permissive license, I would still be liable for that misuse. Especially if I was later made aware of the mistake and didn’t correct it.

I don't believe that's correct in the first instance (at least from a criminal perspective). If someone misrepresents to you that they have the right to authorise you to publish something, and it turns out they don't have that right, you did not willingly infringe and are not liable for the infringement from a criminal perspective[1]. From a civil perspective, likely the copyright owner could still claim damages from you if you were unable to reach a settlement. A court would probably determine the damages to award based on real damages (including loss of earnings for the content creator), rather than anything punitive if it's found that

Further, most jurisdictions have exceptions for short extracts of a larger copyrighted work (e.g. quotes from a book), which may apply to Copilot.

This is my own code, I wrote it myself just now. Can I copyright it?

``` function isOdd (num) { if (num % 2 === 0) { return true; } else { return false; } } ```

What about the following:

``` function isOddAndNotSunday (num) { const date = new Date(); if (num % 2 === 0 && date.getDay() > 0) { return true; } else { return false; } } ```

Where do we draw the line?

[0]: https://docs.github.com/en/site-policy/github-terms/github-t... [1]: https://www.law.cornell.edu/uscode/text/17/506

lmm · on Oct 17, 2022

> From a civil perspective, likely the copyright owner could still claim damages from you if you were unable to reach a settlement. A court would probably determine the damages to award based on real damages (including loss of earnings for the content creator), rather than anything punitive if it's found that

There are statutory damages on top of your actual damages. $50k per act of infringement. No reason for the copyright holder to settle for less when it's an open and shut case.

> Further, most jurisdictions have exceptions for short extracts of a larger copyrighted work (e.g. quotes from a book), which may apply to Copilot.

Quotes do not automatically get an exception just because they're taken from a larger work, they might be excepted either because they were de minimis (essentially because they were too short to be copyrightable) or because they were fair use (which is a complex question that takes into account the purpose and context, which Copilot is very unlikely to satisfy because it's not quoting other code for the purpose of saying something about it).

> Where do we draw the line?

Circuit specific; some but not all circuits use the AFC test. It sounds like this code was both long enough and creative/innovative enough to be well on the wrong side of it though.

d1sxeyes · on Oct 17, 2022

I am not sure about statutory damages.

As I understand it, the complainant may CHOOSE to request the court to levy statutory damages rather than actual damages at any point, but is not entitled to both actual AND statutory (17 U.S. Code § 504)

It also seems to be absolutely capped at 30K per infringement, not 50, and ranges up from $750. It also seems that if the "court finds, that such infringer was not aware and had no reason to believe that his or her acts constituted an infringement of copyright, the court in its discretion may reduce the award of statutory damages to a sum of not less than $200."

I think you are probably right that this specific function is copyrightable though, but taken overall, I think Microsoft's lawyers have probably concluded that they would win any challenge on this. Microsoft have lost court battles before though, so who knows?

lokedhs · on Oct 17, 2022

Your question can actually be answered legally. I'm not a lawyer so I'm not going to tell you what those answers are, but there are pretty well established mechanisms to determine if a function is trivial enough to warrant being copyrighted (a lot of this was explored in the SCO vs. IBM saga)

LtWorf · on Oct 17, 2022

> How do you know GH didn't? Maybe they only included repos with LICENSE.MD files which followed a known permissive licence?

Since copilot famously outputs GPL covered code… no, we have proof they didn't do that.

d1sxeyes · on Oct 17, 2022

I think you've missed my point.

If you write some code and release it under the GPL. Then I take your code, integrate it into my project, and release my project with the MIT licence (for example), it may be that Copilot was only trained on my repo (with the MIT licence)

The fault there is not on Github, it's on me. I was the one who incorrectly used your code in a way that does not conform to the terms of your licence to me.

I don't think the fact that Copilot outputs code which seems to be covered under the GPL proves that Github did not only crawl repositories with permissive licences when training Copilot.