> This is a new area of development, and we’re all learning. I’m personally spending a lot of time chatting with developers, copyright experts, and community stakeholders to understand the most responsible way to leverage LLMs.
Given that there have been major concerns about copyright infringements and license violations since the announcement of Copilot, wouldn't it have been better to do some more of this "learning", and determine what responsibilities may be expected of you by the broader community, before unleashing the product into the wild? For example, why not train it on opt-in repositories for a few years first, and iron out the kinks?
Copyright infringement is not theft in the most important sense that matters. Theft is normally negative sum, copyright infringement is almost always positive sum.
Exactly, it would actually benefit many C/C++ programmers. Some components of NT are very high quality, why not wash their license if the aim is to empower the programmers and also make some profit?
> And why not train it on microsoft windows and office code?
As a thought experiment, if one were to train a model on purely leaked and/or stolen source code, would the use of model step effectively "launder" the code and make later partial reuse legit?
Given that there have been major concerns about copyright infringements and license violations since the announcement of Copilot, wouldn't it have been better to do some more of this "learning", and determine what responsibilities may be expected of you by the broader community, before unleashing the product into the wild? For example, why not train it on opt-in repositories for a few years first, and iron out the kinks?