I couldn't get gpt to quote an actual nyt article no matter how hard I tried...it just hallucinated in the general style of a news article.
Presumably, if it can remember at least a paragraph or two of each article, then surely the same would be true of any text it ingested and the model size would approach the dataset size (probably actually much larger). I don't believe this is the case at all, even searching around, I've not found any good recent examples of it regurgitating copyrighted text verbatim.
It's cool to hate AI stuff if you're a creative atm. But gotta love those generative/algorithm based PS brushes, that's still real art!
"Indeed, the opening paragraph of "A Game of Thrones" by George R.R. Martin, with the chapter titled "Bran," starts as follows:
"The morning had dawned clear and cold, with a crispness that hinted"
And then it cuts off, whether that's because OAI now have an oh shit filter or just the model had access to the first page or publicly available articles quoting the first line, I'm not sure.
I tried other chapters and random sections and it could get a sentence or two right but then hallucinated; what's more likely NYT and GRRM? That your works are being reproduced verbatim? Or that Facebook, YouTube descriptions, fan tumblrs and hell, the publicly available and multiple GoT related wikis that include a variety of passages from the books were used as training data?
I don't think it's necessarily true that model size would need to be larger than dataset size. It's theoretically possible that the model encoding achieves significantly better compression than DEFLATE or GZIP or whatever compression algorithm is used to store the dataset itself.
Presumably, if it can remember at least a paragraph or two of each article, then surely the same would be true of any text it ingested and the model size would approach the dataset size (probably actually much larger). I don't believe this is the case at all, even searching around, I've not found any good recent examples of it regurgitating copyrighted text verbatim.
It's cool to hate AI stuff if you're a creative atm. But gotta love those generative/algorithm based PS brushes, that's still real art!
"Indeed, the opening paragraph of "A Game of Thrones" by George R.R. Martin, with the chapter titled "Bran," starts as follows:
"The morning had dawned clear and cold, with a crispness that hinted"
And then it cuts off, whether that's because OAI now have an oh shit filter or just the model had access to the first page or publicly available articles quoting the first line, I'm not sure.
I tried other chapters and random sections and it could get a sentence or two right but then hallucinated; what's more likely NYT and GRRM? That your works are being reproduced verbatim? Or that Facebook, YouTube descriptions, fan tumblrs and hell, the publicly available and multiple GoT related wikis that include a variety of passages from the books were used as training data?