There's a difference between feeding massive amounts of copyrighted material to a training process that blends them thoroughly and irreversibly, and doing all that in-house, vs. offering people a service that indexes (and possibly partially rehosts) that material, enabling and encouraging users to engage directly in pirating concrete copyrighted works.
Ironically the low tech infringing proposal would lead to more reliable results grounded in the raw contents of the data, using less computing/power and without the confidently incorrect sycophanty we see from the LLMs.
Nah. It would just lead to more of classical search. Which is okay, as it always has been.
LLMs are not retrieval engines, and thinking them as such is missing most of their value. LLMs are understanding engines. Much like for humans, evaluating and incorporating knowledge is necessary to build understanding - however, perfect recall is not.
Another, arguably equivalent way of framing it: the job of an LLM isn't to provide you with the facts; it's main job is to understand what you mean. The "WIM" in "DWIM". Making it do that does require stupid amounts of data and tons of compute in training. Currently, there's no better way, and the only alternative system with similar capabilities are... humans.
IOW, it's not even an apples to oranges comparison, it's apples to gourmet chef.
There's this famous phrase in Russian that was born out of a short interview with a woman, a strong Putin supporter, that's often been used as a sarcastic remark for pointing out someone's double standards and/or hypocrisy.
It can be roughly translated to "you don't understand, it's a completely different situation". That's what's constantly on my mind when I'm reading discussions like this one.
Everybody and their dog torrenting petabytes of data and getting away with it (Meta is the only one that got caught and they've still gotten away with doing it)?
The very same data poor American students were forced to commit suicide over? The same data that average American housewives were sued over for millions of dollars of "damages"? The same data that often gets random German plumbers or steelworkers to pay thousands of euros of "fines" to the copyright mafia so they won't get sued and have their lives ruined?
Yet when giant corporations are doing the exact same thing on a massive scale, it's fine? It's not even the same thing, an American student torrenting books isn't making any money off it, while Meta very much is.
Of course it's not the same, a simple-minded and poorly educated person like me isn't capable of understanding the difference. You keep believing in your moral superiority, the rest of the world has finally woken up.
Is there also a famous Russian phrase that translates to "details are irrelevant, it kinda looks similar to me therefore it's the same"? If not, there definitely should be.
The details are the entire point. Arguing that a corporation can get away doing something, while an individual can't, isn't useful, because there are great many of such somethings, and in most cases it turns out perfectly reasonable, once you dig into details.
>The same data that average American housewives were sued over for millions of dollars of "damages"? The same data that often gets random German plumbers or steelworkers to pay thousands of euros of "fines" to the copyright mafia so they won't get sued and have their lives ruined?
Honestly curious. Could you share any examples of these cases?
There's also a matter of 'aaronsw being a student, not many "poor American students" as GP implies. As far as I know, this was the only case of this type[0][1].
Honestly was too tired to point that out in my earlier reply, but that's exactly the kind of argument you get when people are not willing (or purposefully refusing) to consider details. Intentionally or not, you get bogus and highly manipulative statements.
A single case of a student activist fighting for freedom of communication and access to public goods for citizens, ending up breaking under pressure from public/non-profit institutions MIT, JSTOR, FBI over copyright, is not the same as what GP implied - many students, regular folks just like you and me, being forced to take their own lives due to legal consequences of pirating books in bulk. Nothing like the latter ever happened anyway.
We can do better than this.
(And even if we can't, I trust the courts can.)
--
[0] - Curiously, while doing some search now to be sure I didn't miss any similar case, I learned that JSTOR incident wasn't the first for 'aaronsw - apparently, he did the same thing a few years earlier with public court documents[1]; FBI investigated this too, and concluded he was legally in the clear. It's probably well-known to everyone here, but I somehow missed it, so #TodayILearned.
[2] - https://en.wikipedia.org/wiki/Edwin_Howard_Armstrong was the only one I could find that was even remotely related - an engineer and inventor who, in big part due to prolonged fighting over patents consuming all his time and money, suffered from a mental breakdown and committed suicide at 63.
Uber was blatantly ignoring the local laws in order to break into the market and quickly defeat local competition. They used their infinite VC money supply to interfere with and delay investigations and enforcement, betting that if they do it fast enough, they'll have the general population on their side.
LLM vendors found and exploited[0] a legal uncertainty - correct me if I'm wrong, but AFAIK it still isn't settled whether or not their actions were actually illegal. Unlike Uber, LLM vendors aren't breaking into markets by ignoring the laws to outcompete incumbents, and burning stupid amounts of money just to get away with it. On the contrary, LLM vendors are simply providing an actually useful product, and charging a reasonable price for it, while reinvesting it into improving the product. Effects it has on other markets aside[1], their business model is just providing actual value in exchange for money. That's much more direct and honest than most of the tech industry.
The product itself is also different. Uber is selling a mirage, a "miracle" improvement that quickly turns not so, and is destined to eventually destroy the markets it disrupted. LLM vendors are developing and serving systems that provide actual value to users, directly and obviously so.
--
[0] - Probably walked into this without initially realizing it. No one complained 5-10 years ago, where the datasets were smaller and the resulting models had no real-world utility. It's only when the models became useful, that some people started looking for ways to make them go away.
[1] - That's an unfortunate effect of it being a general AI tool, and would be the same regardless of how it was created.
> > or some other country that doesn't respect international copyright though.
> Like the US? OpenAI et al. don't give a shit.
OpenAI is not a country and therefore cannot make laws that don't respect international (or domestic) copyright. Also the US is a lot bigger than OpenAI and the big tech corps, and the law is very much on the side of copyright holders in the US.
> the law is very much on the side of copyright holders in the US.
Remind me again what the status of the case is with Meta/Facebook using pirated material to train their proprietary LLMs, and even seeding the data back to the community while downloading it?
In progress. Nobody is expecting the original protections afforded by copyright to apply here, but the fact that the material is pirated is less relevant than whether or not an LLM is a transformative use of the material.
We will almost certainly see copyright law weakened by the case, but I do not believe that FB will get off with no penalties.
The money is definitely in the side of big tech vs book publishers. There may be a nominal settlement to end the matter, perhaps after a decade of litigation
I've never heard of it before, and it makes perfect sense what it is from that intro.
On a celestial sphere (planet, star, etc) the declination angle (being 0 is at the equator, being 90 degrees is the north pole of the sphere, being -90 degrees, is at the south pole).
You also need another angle known as the "hour angle" to locate a point on the sphere. It doesn't explain what that is, but as can be seen on Wikipedia, you can easily click on that word to go to the entire page that explains what it is.
Well that was a whole other topic. And luckily it links to a page that explains the whole topic of what a "celestial sphere" is. Going to the page, I see I was indeed wrong about what it was, but now I see it is an abstract sphere, with a radius that can be whatever size you want, and that is centered on the Earth, or on the observer.
Once again, not so difficult to figure out even if you have no experience in the specific technical field of a Wikipedia article. So I have no idea what /u/casenmgreen's problem is.
Consider the space we're in. For game development you're going to have a lot of developers with a lot of different ideas about how to make a game, all utilizing the same engine. If the engine doesn't come with a feature I need, I'll probably have to code it myself, but seeing as the whole purpose of me making this feature is for my game, then it makes sense that I should be able to keep my game's feature private/proprietary without the need to push that feature back to engine which might not even want my feature to begin with. This is why GPL is not a good choice for game engines.
GPL doesn't require you to push a feature/change/etc back to the engine devs, it only requires you to make it available to others. You can just keep your changes in a ZIP file alongside your game's data - which is what a bunch of games built on the GPL releases of id Tech already do.
> Game dev at the top tiers is an arms race. Being able to do proprietary things is attractive to big players.
Yeah, so I don't see how helping out the big players and not everyone else is a good thing.
>Multiple projects have gone closed-source from open source. Assurances are a nice thing to have (but certainly no guarantee).
Yeah but the open source ones ARE guaranteed. Even if they later become closed source, the code up till that point will remain open source forever. So it is guaranteed whereas "some assurances" mean nothing.
> Yeah, so I don't see how helping out the big players and not everyone else is a good thing.
If you want your stuff to be private, you have a legal option.
> Yeah but the open source ones ARE guaranteed. Even if they later become closed source, the code up till that point will remain open source forever. So it is guaranteed whereas "some assurances" mean nothing.
> Yeah but the open source ones ARE guaranteed. Even if they later become closed source, the code up till that point will remain open source forever.
The changes from the Apache 2.0 license are sufficiently minimal that you can _still_ fork it from that point, you just (a) won't be able to use the trademark (b) won't be able to sell it.
Given the clearly stated goals of the foundation and hence the project, that seems to be providing exactly the guarantees they intend to provide, and while your point about assurances is entirely fair, I think you're underestimating the level of legal guarantees that you do get here.
>> You can make proprietary changes to the engine without releasing them (unlike GPL).
> Why is that a good thing?
Instead of writing an internal project from scratch, you modify an existing project and tightly couple it with your internal process. What's wrong with that?
text-wrap: pretty tells the browser to wrap the text so as to make it look pretty. But the CSS standard doesn't specify what exactly that means; it's up to each individual browser to decide what algorithm yields the prettiest results.
Chromium is the only browser engine whose stable channel currently supports text-wrap: pretty. In this post, WebKit is announcing not only that they've implemented it (though not yet in a stable channel), but that they've done so using an algorithm that's better than Chromium's. Their algorithm adjusts for various things that Chromium's currently does not.
The WebKit implementation is the only one that can handle many pages of text with no noticeable performance hit, while Chrome and Firefox are limited to only dealing with the last 4 or 6 lines of a paragraph.
Yeah, corporations have the resources to do that kind of investment in Linux which random hobbiests don't.
But why do they do it in the first place, instead of investing in their own obviously supiriour massively invested in OS's? Because Linux IS better, and the whole idea of it is better than some closed source crap. By nature of the GPL license it will snowball and everyone else will be left behind.
It's very easy to hate on him for that very reason. He's just buying a good reputation for the fraction of his wealth that is completely insignificant.
If I could buy that kind of reputation by tossing a few coins into the void, why not? Especially after I've stolen billions from others.
As a non-American this is a horrendous idea. People need to accept that assholes and misinformation exist. And you will encounter it in real life and on the internet. You can't expect a nanny state to protect you from every slight discomfort you experience. Learn how to deal with it.
Like the US? OpenAI et al. don't give a shit.