Hacker News new | past | comments | ask | show | jobs | submit login

Probably not. Copyright infringement in the manner we're talking about presumes you already have license to access the code (like how Github does). What you don't have license to do is distribute the code -- entirely or not without meeting certain conditions. You're perfectly free to do whatever naughty things you want with the code, sans run it, in private.

The literal act of making modifications isn't infringement until you distribute those modifications -- and we're talking about a situation where you've changed the code enough that it isn't considered a derivative work anymore (apparently) so that's kosher.




First the case would be dismissed if Copilot had permission to make copies. Clearly they didn’t. Copyright cares about copies, for profit distribution just makes this worse.

> you already have license to access the code

This isn’t access, that occurs before the AI is trained. It’s access > make copy for training > AI does lossy compression > request unzips that compression making a new copy > process fuzzes the copy so it’s not so obvious > derivative work sent to users.


Clearly Copilot had permission to make (unmodified) copies, the same way Github's webserver had permission to make (unmodified) copies. The lawsuit is about making partial copies without attribution.


GitHub's terms of service (TOS), in my non-lawyerly opinion, clearly states the license for uploaded works granted to them by users doesn't cover using the data to train an LLM or any kind of model beyond those used to improve the hosting service:

>You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time

>This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

https://docs.github.com/en/site-policy/github-terms/github-t...

I think the important questions are (1) whether "the Service" includes Copilot, and (2) whether GitHub is selling users' content with Copilot.

For (1), I'm unhappy to admit Copilot probably does fall under "the Service," which is nebulously defined as "applications, software, products, and services provided by GitHub." But I'll still say that users' could not agree to this use while GitHub was training The Copilot model but hadn't yet announced it. At that time, a reasonable user would've believed GitHub's services only covered repository hosting, user accounts, and the extra features attached to those (issue trackers, organizations, etc).

GitHub could defend themselves on point (2) by saying they aren't selling the code, instead selling a product that used the code as input. But does that differ much from selling an online service that relies on running user code? The code is input for their servers, and it doesn't need to be distributed as part of that questionable service. But it's a clear break from the TOS.


GitHub’s web server is not the same thing as Copilot and needs separate permission.

GitHub didn’t just copy open source code they copped everything without respect to license. As such attribution which may have allowed some copying isn’t generally relevant.

Really a public repo on GitHub doesn’t even mean the person uploading it owns the code, if they needed to verify ownership before training they couldn’t have started. Thus by necessity they must take the stance that copyright is irrelevant.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: