Just so I understand you properly: Original Inputs (A) -> NN (Q) -> Output (X) Y...

daenz · on May 3, 2022

Can you identify GPT text versus authentic text? If so, then there are features in that text that give it away. It stands to reason that there exist other features in the text, based on the training data the model was fed, and other characteristics of the model, that a discriminator model could use to detect, with some confidence, which model produced the text. A discriminator model which can detect a specific generative model essentially captures its "fingerprint".

An example of some of these features might be the use of specific word pairs around other word pairs. Or a peculiar verb conjugation in the presence of a specific preposition.