These models can definitely be used to intentionally store and recall content th...

These models can definitely be used to intentionally store and recall content that is copyrighted in a way that's not subject to fair use. (eg: trivially, I can very easily train a large model that has a small subnetwork which encodes a compressed or even lossless copy of a picture, and if I were to intentionally train a model is that way then this would be no less a copyright violation than distributing a JPEG of the same image embedded in some large binary).

But also, an unintentional copy of a copyrighted image is not a violation of copyright. (eg: an executable binary which happens to contain the bits corresponding to a picture of Batman -- but which are actually instruction sequences and were provably not intended to encode the picture -- clearly doesn't infringe.)

LLMs are somewhere in-between #1 and #2, and the intent can happen both in the training and also the prompting.

Stack on top of this the fact that the models can also definitely generate content that counts as fair use, or which isn't copyrighted.

It's the multitude of possible outputs, across the copyright spectrum, combined with the function of intent in training and/or prompting, which make this such a thorny legal issue for which existing copyright statute and jurisprudence is ill-suited.

Taking your Batman example: DC would come after you for trademark as well as copyright, and the copyright claims would be very carefully evaluated with respect to your very specific work. But here we are talking about a large model that can generate tons of different work which isn't subject to copyright or which is possibly fair use.

I don't think that existing jurisprudence (or even statute?!) can handle this situation very well, at all, without tons of arbitrary interpretative work on the parts of juries/judges, because of the multitude and vague intent issues described above.

(...Also presumably the merits of the DC case wouldn't matter because your victory would be pyhrric unless you are a mega-corp. Which from a legal theory perspective is neither here nor there but from a legal practicality perspective may inform how companies go about enforcing copyright claims on model weights/outputs.)

Anyways. I think we have a right mess on our hands and the legislature needs to do their damn jobs. Welcome to America, I guess :)

Curious to hear your thoughts on these issues.