They say they scraped the open web - so for example this would include many of our personal sites, many of which have profile pictures.
For myself, I took the picture on my site, and it's under a: Attribution, NonCommercial, NoDerivatives CC license. I'd argue that
1. Using my/anyone's profile picture in an AI system for profit is commercial use.
2. A neural network is a derivative work of all images used to train that network.
So on point 1 I agree with you. I think point 2 is pretty iffy though. Unless there has been some recent legal proceeding that I am unaware of, point 2 isn't true.
Oh yeah, I'm not sure either are true legally as I'm not a lawyer - just my opinion.
The reasoning I follow for point 2 is:
That if a neural network is not derivative of its inputs, and given a sufficiently large gan, you could "launder" inputs into copy-write free outputs. That's also not been done as far as I know, but I know it's starting to be an issue in NLP.
Re: 2 - Legally no. Like a search engine’s index it is not a derivative work but a “transformative” one and therefore not subject to copyright restrictions.
1. Using my/anyone's profile picture in an AI system for profit is commercial use. 2. A neural network is a derivative work of all images used to train that network.