Your exchange has made me wonder. Yes, whatever AI produces is not genuine stuff. But there is something we could call "Shakespeare-ness", and maybe it is quantifiable.
How would a realistic Turing test for "Shakespeare-ness" look like?
Big experts on Shakespeare likely remember (at least vaguely) all his sonnets, so they cannot be part of a blinded study ("Did Shakespeare write this or no?"), because they would realize that they have never seen those particular lines, and answer based on their knowledge.
Maybe asking more general English Lit teachers could work.
Extra Terrible Lines are indeed fun. We've had 9 months of development since then, though; maybe it would make sense to repeat those experiments twice a year.
IIRC Scott Alexander is doing something similar with his "AI draws nontrivial prompts" bet, and the difference to last year's results was striking.
Also, this really needs blinding, otherwise the temptation to show off one's sophistication and subtlety is big. Remember how oenologists consistently fail to distinguish between a USD 20 and a USD 2000 wine bottle when blinded.