Hacker News new | past | comments | ask | show | jobs | submit login

Which kind of suggests Microsoft made a really bad move antagonizing the open-source community with Gtihub Copilot.

They got a few years of lead time in the "AI codes for you" market, but in exchange permanently soured a significant fraction of their potential userbase who will turn to open-source alternatives soon anyway.

I wonder if they'd have been better served focusing on selling Azure usage and released Copilot as an open-source product.




How did Microsoft sour developers with Copilot? I know dozens of people that pay for it (including myself) and I feel like it is widely regarded as a "no brainer" for the price that it's offered at.

Please help me understand!


The company that tried to kill Linux in the 90s, owned by the world's most famously rich man, is now stealing my code and selling it back to me? Yeah, fuck that.


This isn't stealing at all. I want my open source code to be used like this.


If only there was some kind of contract like thing you could release your code under so that there was no ambiguity.


Sarcasm doesn't translate well online...

To be clear: there is and it's pretty difficult to argue that MS is violating even the GPL.


It's not selling you back your code. It's different code, adapted to a different task; your own code is forever free for you, you don't need anyone to give it to you.

Given the cost of running these models, and the utmost dedication needed to train them, I think it is worth it. GPUs cost money, electricity costs money. They can't serve the world for free and offer good latency.


I mean, that's like saying an author steals the open source alphabet and charges you for reading their ordering of letters, as if the ordering of letters isn't where all the value is.


These models are trained on sequences of words, not told the letters and left to get on with it.


It continues to amaze how people are incapable of following even the most trivial cases of abstract reasoning.


They didn’t. There is a small group of people that are always looking for the latest reason to be outraged and to point at any of the big tech companies and go “aha! They are evil!” Copilot’s ai was trained on GitHub projects and so these people are taking their turns clutching their pearls inside of their little bubble.

I’d bet that more than 95% of devs haven’t even heard of this “controversy” and even if they did, wouldn’t care.


I'm not so sure.

I do think the controversy is stupid, but inside my own company, we significantly delayed migrating some projects to Github because people were concerned that the way Microsoft handled Copilot meant that Github wasn't a safe long-term host for an open-source project (and yes, I'm aware of all the reason that's irrational).

Even if the people angry about Copilot are a minority, it might still have a bad move. Trust accumulates slowly over years, but mistrusts builds up over only a few events. People are still remembering Microsoft's anticompetitive practices from 20 years ago. The mistakes it makes now might stick for a long time.


Presumably, because they trained Copilot on billions of lines of, often licensed, code (without permission), that Copilot has a tendency to regurgitate verbatim, without said license.


For a specific example some variation of "fast inverse square root" will usually get you the exact GPL licensed code from Quake III, comments included.


Do you mean the same code that has its own Wikipedia page where the exact code is written, comments included, and has probably been copy pasted into 100’s of other projects?

https://en.m.wikipedia.org/wiki/Fast_inverse_square_root


You mean this code?

https://archive.softwareheritage.org/browse/content/sha1_git...

Do you see that notice at the top of the file? It says:

==

This file is part of Quake III Arena source code.

Quake III Arena source code is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

===

but because it's been laundered by Microsoft, you think it's okay to steal free software and make it proprietary?


How is it made proprietary? The Quake III Arena is no more proprietary now then if it were stored on GitHub proprietary web servers. Copilot is just a fancy code index, that sometimes returns the original code and other times it gives you a modified copy.


Because as you say, it provides original or modified code but doesn't provide provenance or license information. It's copyright laundering. After decades of fighting the community in the courts over shit like this, Microsoft just turns around and says well, it's okay when we do it? Foh.


The problem is you have to obey the license of the code even if you just take a snippet and Copilot does not reproduce the correct license.


"The algorithm was often misattributed to John Carmack, but in fact the code is based on an unpublished paper by William Kahan and K.C. Ng circulated in May 1986"


That code didn't originate from quake


The point is that it's charging having been trained on open source code. What you're saying agrees with that, but your triumphant tone seems to be implying the opposite. Which did you mean?


Yes that code, I was replying to a comment claiming that

> Copilot has a tendency to regurgitate [code] verbatim, without said license.

and I think that is a pretty good example.


> that Copilot has a tendency to regurgitate verbatim, without said license.

A "tendency" is overstating it. I'm not aware of any example that would have been likely to occur if the author wasn't specifically trying to get the regurgitated code.


Controversy about unlicensed use of source code as training data and lack of attributions in the generated output.


Microsoft makes dozens of "really bad moves" every year. This is nothing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: