Why do I think it's easy: the goal of current generation AI image generation project was just to produce images that look good to humans. Not to be indistinguishable.
Even for casual human observer, they are still relatively easy to spot. A trained machine that can pay more attention to details should do even better.
In some sense, this is just the same idea as a GAN. Only that the generator is fixed, and we are only training the discriminator.
With future systems, distinguishing them might be harder.
Could you explain how this is done and why it's easy?