More

lt · on Oct 31, 2023

A diffusion model can't make a copy. That's the whole point. The original Picasso isn't in the model weights.

It has learned to make pixels a particular color to mimic that style, but that's it.

kranke155 · on Oct 31, 2023

If the model didn't learn anything important from Picasso, it wouldn't be in the training data.

This whole argument of "ah but it doesnt really need it" doesn't hold up. If the model didn't need it, it wouldn't have used it in the first place.

Same thing in Artstation. It was of course propitious for AI scientists to find such a lovely database of high quality imagery, and all so helpfully tagged into categories.

All they had to do was take it.

shkkmo · on Oct 31, 2023

> If the model didn't learn anything important from Picasso, it wouldn't be in the training data.

> This whole argument of "ah but it doesnt really need it" doesn't hold up. If the model didn't need it, it wouldn't have used it in the first place.

I haven't seen anyone making this argument. There's a pretty clear difference between learning something from an image and memorizing it.

There also isn't any illegal with memorizing an image and painting a reproduction. What you aren't allowed to do is sell or distribute that reproduction without a license.

I think it makes more sense to restrict what people are allowed do with ML tools than to restrict what ML tools can do.

lt · on Oct 31, 2023

Of course it learned, that's the point of training.

You claimed the model can reproduce an image from that training data. That's false, and what the judge dismissed.

  “none of the Stable Diffusion output images provided in response to a 
  particular Text Prompt is likely to be a close match for any specific image in  
  the training data.”

  “I am not convinced that copyright claims based a derivative theory can 
  survive absent ‘substantial similarity’ type allegations,” the ruling stated.

Whether using copyrighted data to train a model is fair use or not is a different discussion.

lt · on Aug 2, 2023

As I've read it, the first lead to the material was 99.

2018 They got funding to research it further,

2020 was a first attempt of publication at Nature that was retracted, further improvements were made until 22/23 were two patents were filled, then suddenly 10 days ago Kwon, one of the co-researchers jumped the gun publishing a paper with the details, on one hand fearing a leak of someone else publishing first as that was too simple to replicate, on the other hand excluding everyone else from the paper and only listing him and Lee/Kim (LK) as authors as a Nobel prize can only be shared by three people. 2.5hrs later LK published again listing other 5 authors but him.

Sargos · on Aug 2, 2023

Kwon is humanities' hero for leaking this knowledge to the world

cooper_ganglia · on Aug 2, 2023

Ultimately I'm glad that the research has seen the light, since I'm of the personal persuasion that there is no single scrap of science that should ever be done in the dark. All research, in a perfect world, would be entirely public and freely and easily accessible.

With that being said, I'm not sure if leaking a paper and selfishly putting your name on it and excluding others so that you win a Nobel Prize doesn't exactly seem "heroic". Certainly beneficial for mankind, but it seems like a self-serving action.

zarzavat · on Aug 2, 2023

The means justify the end. For all we know Lee and Kim would have sat on this for another year or more. I think that’s very understandable and I can’t fault them for wanting to be certain, given all the nasty things people have been saying about them, but the leak has clearly served humanity better than keeping it under wraps.

yetihehe · on Aug 2, 2023

> but the leak has clearly served humanity better than keeping it under wraps.

Yeah, advancing human knowledge serves humanity, but unfortunately not really those who are advancing that knowledge. Those with money will just use your invention and make more money, you will have a pat on the back. I wish it was more balanced, we would have a lot of inventions sooner and implemented faster.

arriu · on Aug 2, 2023

Im sure I’m not the only person thinking 24 years is way too long for such an important advancement.

Huge respect for those in this field or others that don’t give up after so many years. Thank you

xorbax · on Aug 2, 2023

24 years is nothing in the scale of the universe

If it increases entropy as much as many suspect and it only took 1/3 of a couple humans' lives to open that phase space, the Universe has done what it wanted - to hasten heat-death.

odyssey7 · on Aug 2, 2023

Interesting angle — but if heat death were possible it would have happened by now.

cthalupa · on Aug 2, 2023

Huh? Heat death of the universe is going to take an incomprehensibly large amount of time. Like, 10^106 years.

The universe is A LOT younger than that.

odyssey7 · on Aug 2, 2023

As Hawking once explained, “Since events before the Big Bang have no observational consequences, one may as well cut them out of the theory, and say that time began at the Big Bang.” When cosmologists talk about the universe and its age, it seems to me, as a non-cosmologist, that they’re using terms of art related to their models.

Hawking’s explanation deduces that if the observable universe expanded from a singularity, we would be unable to meaningfully theorize what happened before then, since it would be beyond any form of observation to test the theory. Therefore, a scientific model rooted in observation can describe nothing earlier than the Big Bang.

However, not everything unseen is untrue. If a singularity were to form somewhere in Andromeda tomorrow — in all likelihood, one will — we will still have existed today.

Edit: The initial comment was meant as a lighthearted reply to the universe personification, but I ended up sensing a need to explain the reasoning.

xorbax · on Aug 2, 2023

It's not "personification", it's the universe tending toward entropy increasing overall. I don't think I've heard anyone claim that heat death "should have happened" as an argument against it, or what it's supposed to mean in reference to the original post.

There is a singularity in Andromeda, so I don't know why one forming matters

soulofmischief · on Aug 2, 2023

Please read up on heat death before continuing to share baseless information.

https://en.wikipedia.org/wiki/Heat_death_of_the_universe

TheBigSalad · on Aug 2, 2023

So... He leaked and claimed it was because he feared someone else would do it before him?

lt · on June 27, 2023

The contents of whatever it found online is fed back in the context, plus the response it generates based on that also counts to the limit.

So 512 is really inadequate unless you just want to make search queries using natural language.

lt · on June 15, 2023

they point openai.api_base to their server that implements the same API

OkGoDoIt · on June 15, 2023

That’s clever. Do other LLM API’s do that?

dygd · on June 15, 2023

Yesterday there was a "Launch HN" thread for credal.ai [0] and I noticed that they use the same openai.api_base trick [1].

[0] https://news.ycombinator.com/item?id=36326525 [1] https://credalai.notion.site/Drop-In-APIs-3a45d32405c347e8bf...

anonzzzies · on June 15, 2023

It would take you (or gpt) 3 seconds to write an openai compatible wrapper; the inference api is trivial for all LLMs.

arbuge · on June 15, 2023

Ah, I missed that. Thanks.

lt · on June 12, 2023

I see three perspectives on the whole thing:

First, Reddit's monetization is broken by design. It never made any sense to me why they would charge for reddit gold for an ad-free experience on their website and own mobile app but not on the API. Why would they let third party apps serve their own ads and let them charge to remove them? This would be simple to fix, both technically and in the API's ToS, just serve the same ads regardless of the client. People would be upset, but ultimately I feel it would be entirely fair. But no, it doesn't seem to be a solution considered.

Second, the LLM dataset issue is also attributed to the price hike. Again, I think it's fair if unpopular to charge premium for bulk data. Again, there are technical and ToS solutions for this. They could introduce exponential tiers for bulk data, restrictions on allowed usage, other things that make user-facing usage reasonable but bulk processing expensive, but then again, starting measuring api usage per client id and not per user goes against this point, just making the API extremely expensive for everyone anywhere to the point of being unusable.

Third, all points seem to lead to the fact that what they really want is to kill third party apps and hope a large part of those users move to their app, for what? More tracking, tighter grip, better engagement metrics? Not sure. Even the changes to the extremely hostile mobile site now forcing some users to download the app. Really, I'd figure they'd understand their userbase better than that and how a small fraction on content producers and a even smaller fraction of power users and moderators carry the site, and pissing them off is a really bad idea. But what do I know.

lt · on June 12, 2023

Not to mention it’s actually the users generating the content.

GreeningRun · on June 12, 2023

Yup that's the key point a lot forget

lt · on June 12, 2023

Recently I read Reddit has over 2000 employees. I can’t begin to imagine what they possibly do.

lt · on May 19, 2023

Longer video with more examples from one of the paper authors:

https://twitter.com/XingangP/status/1659483374174584832

amelius · on May 19, 2023

Looks like it can't keep the background stable, so I guess this is not suitable for animations.

lt · on May 2, 2023

Can you elaborate about what you mean by pressure testing? Haven't heard this term yet.

andrewcamel · on May 2, 2023

Maybe not the right term... Just that a lot of other libs act like guardrails, i.e. let the model generate what it does (in full form text / GPT output), and try to then parse out what you want, error if output doesn't conform to standard format. As opposed to basically only allowing the model to generate into the already-built JSON form fields. Understandable why this guardrails/parsing approach is so popular though... can't do what this library is doing with OpenAI API. Need to be able to manipulate the token generation; otherwise you're forced to take full text output and try to parse it.

lt · on March 28, 2023

Not sure if that was a typo or the joke, but hilarious nonetheless