This is just trolling and typical of people who just want to shoot down what others have done because they cant or haven't created anything themselves. How about showing a positive solution instead of the equivalent of finding reasons why we can't do anything. If everyone had this attitude we'd still all be hiding in trees somewhere
The positive solution is to scrape Wikimedia Commons for everything in "Category: PD-Art-old-100" and train from scratch on that data. Wikimedia Commons is well-moderated, the image data is public domain[0], and the labels can be filtered down to CC-BY or CC-BY-SA subsets[1]. Your resulting model will be CC-BY-SA licensed and the output completely copyright-free.
For the record, that's what I've been trying to do[2]; my stumbling blocks have been training time and a bug where my resulting pipeline seems to do the opposite of what I ask[3]. I'm assuming it's because my wikitext parser was broken and CLIP didn't have enough text data to train on; I'll have the answer tomorrow when I have a fully-trained U-Net to play with.
If I can ever get this working, I want to also build a CLIP pipeline that can attribute generated images against the training set. This would make it possible to safely use CC-BY and CC-BY-SA datasets: after generating
You can check my Google Scholar [0]. I have created many, high-impact datasets that were 1) formative in their respective areas and 2) have seen downstream usage in disasters and wars around the world. Not once in creating those datasets did we take the “easy” route by compromising on the ethics of data collection.
The positive solution here was to not collect data if there was a reasonable ethical concern. This classic mindset of “anything goes as long as we create value” is highly toxic.