Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I welcome anyone who works at Adobe to simply answer this question and put it to rest. There is absolutely nothing sensitive about the issue, unless it exposes them in a lie.

So no chance. I think it's a big fat lie. They'd have to have made some other scientific breakthrough, which they didn't.

Using information from https://openai.com/research/clip and https://github.com/mlfoundations/open_clip, it's possible to answer this question.

It's certainly not impossible, but it's impracticable. On 248m images (roughly the size of Adobe Stock), CLIP gets 37% on ImageNet, and on the 2000m from LAION, it performs 71-80%. And even with 2000m images, CLIP is substantially worse performing than the approach that Imagen uses for "text comprehension," which relies on essentially many billions more images and text tokens.



Interesting. I looked through the laion Datasets a bit and it was astonishing how bad the captions really are. Very very short captions if not completely wrong. Amazing to me that this even works at all. I wonder how much better clip etc would perform and be more efficient if they had probably tagged images, not just with the alt text. Maybe that's why dalle 3 is so good at following the prompts?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: