First of all I dont think we should aim at 100% precision and 100% recall. Inste...

First of all I dont think we should aim at 100% precision and 100% recall. Instead it is more realistic to use the result as a filter for more downstream testings and treat it as a mean to increase the cost of cheating.

Also we can use more than just the text output. A human writer doesn't generate a piece of text in one pass. Instead they go through the drafting and editing process. We can design devices to capture keypress or pen stroke(iirc ther were studies on fraud detection based on keypress patterns/mouse movment). One can attempt to train a new model to mimic themselves so we need to somehow make sure that the amount of training data required is too much to be worth the effort.

For downstream testing, the goal isn't mainly to verify whether a piece of text is AI genereted but to make sure that a student who can pass the test would essentially have to know the material sufficient well(so this defeats the purpose of cheating).