At least it could add a theoretical bound on the expected hallucinations for a particular model/quant at hand? Although I'm very skeptical that companies would disclose their training corpus, and derivative models trained on top of foundation models are another level of indirection, it would still be interesting to have these numbers, even if just as rough estimates. The compression angle in this thread is spot-on, but yeah, operationalizing this is hard.