Hacker Newsnew | past | comments | ask | show | jobs | submit | dcwca's commentslogin

It is totally legal to train on this stuff, but illegal to reproduce copyrighted works. Interestingly, Google's business model could have been criticized the same way. They construct a big index of copyrighted works, reproduce them, and monetize it.


They don't generate new content that convinces people it came from the sources they trained on.

The entire business model is "we trained on their stuff, pay us, not them." No way that's fair use.


I mean, if I go to the library and read books, and then get a job where I use that knowledge, the company pays me. Not the authors of the books I read.

So I don't see how their business model is any different from literally every person who learns things and then sells their ability to apply that knowledge.


The people that made the product possible get nothing, this is the difference. The library paid for a copy of the book, so did millions of others.

In the example you gave, it would be the equivalent to you getting a job, working hard to produce something, and get nothing in return.


What are you expecting the people who write the books to get?

Do you agree that if an author sold 43,958 copies, then it's fine for OpenAI to purchase one, so that the author sold 43,959? But also fine for OpenAI to ingest scanned used copies that are loaned to it? The same way it's fine for me to read a friend's book, or all of a friend's books, that they loan me, and the author doesn't get anything additional? The same way it's fine for me to go the library and the author doesn't get paid anything extra?

Or are you trying to invent some new principle where OpenAI has to pay some new ongoing fee? And if so, on what basis?

(And no, my example still stands entirely. It's from the perspective of somebody who learned from books, and they are getting paid, the same way people pay OpenAI to use ChatGPT. It's not from the perspective of authors, because again -- they make no additional money when somebody goes to the library to read their book that the library already purchased.)


It's not about what the "author should get for their book". It's the OpenAI benefits unfairly from using everyone's work to make nearly endless money and lobby for regulatory capture.

The author should get access to the model, the weights, it should all be open source because it partly contains their work. Just like how OpenAI could outright buy a copy of the authors work.

Basically, I think this is where knowledge and money are coming into an unresolveable conflict, who owns the ideas ? who owns information?

OpenAI seem to be trying to have a monopoly on information, and while they seem to be failing (thankfully), it's really where the issue lies for me.


Where are you getting this "nearly endless money" and "lobby for regulatory capture" and "monopoly on information"?

OpenAI competes with Google competes with a bunch of other companies, and surely this is only the beginning of a ton of competition as better and better models are developed. There's no "nearly endless money" when there's competition and GPU training costs a fortune.

The idea that all models should be open source to everyone or all content creators doesn't make any more sense than the idea that all the work I do should be open sourced to the authors of every book I've read, and every teacher I've ever had.

You ask two questions that have clear answers already:

> who owns the ideas?

Nobody. Legally speaking there's no such thing as ownership of ideas, except in the narrow case of patents (and if you consider trademarks to be ideas).

> who owns information?

You can copyright a particular, exact expression of information. The author of a book owns its text; the studio behind a movie owns the image in each frame.

But once you leave behind an exact expression of information, you're back in the realm of ideas, and there's no such thing as ownership of ideas. Which is why as long as ChatGPT and other models repeat ideas but not paragraphs of exact copyrighted wording, there's no legal issue. Because they're doing the same exact thing every human being does every day.


There are about 3 companies competing for any level of serious business regarding AI. Where are you getting anything from?

There's no "nearly endless money" when there's competition and GPU training costs a fortune.

They cost a fortune for now. That won't be the way forever.


> There are about 3 companies competing for any level of serious business regarding AI. Where are you getting anything from?

Three companies is huge. That's the very definition of competition, the polar opposite of monopoly.

> They cost a fortune for now. That won't be the way forever.

Yes, and as costs come down it becomes easier for more competitors to enter the space to build all sorts of other products. Again, a good thing. It's not like the difference just turns into profit. That's not what happens in a market economy.


When the prices come down, it won't be a monopoly, but right now building something competitive is nearly impossible for 99% of the world.


Writing a book about C isn't the same thing as "write me a mail server in C".

The right analogy here is you read their book about C and write another one in exactly (enough) style that your book can stand in for theirs, but you sell it for pennies.


The difference is: LLMs are not humans with human needs and human rights. Unless these for profit AI companies can ensure that they can fairly compensate the sources of their training data, they’re using IP they have no right to use in order to replace the work of living breathing humans who need income in order to live in houses and eat food. Why would you place the potential profits of the few (and the massive environmental impact of using LLMs) over the needs and rights of your neighbors and humans all around the world?


Security and privacy mentioned constantly throughout marketing materials


So their macbooks are insecure because they accept any package installed? Seems like a weak argument and an even weaker security model, has it worked ever security through obscurity/ black box? Should I avoid surfing through the web on non approved websites? Seems like they made a weak system security wise


It’s because it is actually a somewhat bullshit security model that basically boils down to checking what APIs are you using in your code.

The next generation of runtimes (wasm for example) have this built in out of the box.


With all due respect, Chomsky has no intuition for this kind of technology. Was sad to see him reject the technology, felt oddly like sour grapes when really LLMs have validated his work on universal grammar.


Right now, Altman may be the most relevant for the further development of AI because the way the technology continues to go to market will be largely shaped by the regulatory environments that exist globally, and Sam leading OAI is in by far thr best position to influence guide that policy. And he has been doing a good job with it.


Worth the subscription fee for sure


Something I don’t read enough about that would make autonomous cars a lot more reliable is smart tech in the road infra. Why don’t stop lights send a beacon? Roads and lanes should send a signal. Cars should send signals to each other about their intent eg. turning indicators should not be purely visual, etc. Are there standards for this stuff being worked on?


Every stop light is actually required to have a beacon to be fit for purpose. It's universal among all manufacturers in every country.

When it's time to stop, they emit 700nm waves, but crucially, only in the direction that they apply. In fact, brake lights are required to emit almost identical waves, but in a different pattern.

Having spent the last 30 hours autonomously driving from Texas to the east coast, I can confidently say it never failed to detect these lights, and the existing standard is both sufficient and not prone to invisible errors, which you seem to want to introduce.


We shrug them off because they are the devil we know: a super well understood risk. Also because safety is improving, so the trend is toward less risk not more.


This is such great work.


I watched that documentary and had a good laugh at how poorly made it was. The filmmaker starts out by revealing that he just loves the ocean and wanted to make a film about it but didn’t know what the topic should be. The arc of the film is him googling for things, gaining the most basic perspective on an issue, pivoting his film toward it, and then failing to get any good interviews or original footage. So funny. He goes through ocean plastics, dolphins, shark fins, fishing net pollution, and one or two more with about as much depth as you would get from a tweet. Also why would you name your film Seaspiracy when Conspirasea is right there. 10/10 if taken as a mocumentary.

The issues are real and I appreciate raising awareness but wow. Not good work.


Not good work at all.

Tabrizi's approach is to tackle a bunch of different subjects with the same attention span of a child on Xmas morning. Each new subject is treated as 'the' most important one, but he drops it unceremoniously after ten minutes.

Then there's the cherry picking of interviewees and their arguments, which is a disservice to the cause – and completely unethical. Even environmentalists interviewed as 'the good guys' complained about this[0].

If Tabrizi is genuinely a fan of the likes of Cousteau and Attenborough, he should watch and listen a lot more before making another documentary.

[0] https://twitter.com/christinachicks/status/13753901779133726...


Naming - because the same guy also made cowspiracy.

Quality - I would say it was pretty illuminating on a lot of issues that almost nobody here will get to see first hand.


There are cases where it’s less important for bands to be paid for their music than it is to not sue children and not cripple consumer technology.


Who in this discussion was advocating suing children?


Metallica


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: