Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is a new area of development, and we’re all learning. I’m personally spending a lot of time chatting with developers, copyright experts, and community stakeholders to understand the most responsible way to leverage LLMs.

Given that there have been major concerns about copyright infringements and license violations since the announcement of Copilot, wouldn't it have been better to do some more of this "learning", and determine what responsibilities may be expected of you by the broader community, before unleashing the product into the wild? For example, why not train it on opt-in repositories for a few years first, and iron out the kinks?



> why not train it on opt-in repositories for a few years first, and iron out the kinks?

Ha ha. Because then the product couldn’t be built. Better to steal now and ask forgiveness later, or better yet, deny the theft ever occurred.


If Copilot was designed with any ethics in mind, it would have been an opt-in model.

Instead, they scoured and plagiarized everyone's source code without their consent.


Because the ethical opt-in model builders are still working on putting together their cleanly sourced dataset.


Copyright infringement is not theft in the most important sense that matters. Theft is normally negative sum, copyright infringement is almost always positive sum.


Had to find this after a long time

IT Crowd Piracy Warning https://www.youtube.com/watch?v=ALZZx1xmAzg


And why not train it on microsoft windows and office code?


Exactly, it would actually benefit many C/C++ programmers. Some components of NT are very high quality, why not wash their license if the aim is to empower the programmers and also make some profit?


Because then your Re4ct code would look like this:

    export default class USERCOMPONENT extends REACTCOMPONENT<IUSER, {}> {
    constructor (oProps: IUSER){
      super(oProps);
    }
    render() {
      return (  
        <div>
          <h1>User Component</h1>
            Hello, <b>{This.oProps.sName}</b>
            <br/>
            You are <b>{This.oProps.dwAge} years old</b>
            <br/>
            You live at: <b>{This.oProps.sAddress}</b>
            <br/>
            You were born: <b>{This.oProps.oDoB.ToDateString()}</b>
        </div>
        );
      }
    }


Easy pal, we don't want to multiply that shyte.


> And why not train it on microsoft windows and office code?

As a thought experiment, if one were to train a model on purely leaked and/or stolen source code, would the use of model step effectively "launder" the code and make later partial reuse legit?


Only if it's not microsoft's leaked code, I guess :)


That is a rather good question.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: