> Every one I've spot checked right now has been correct, and I might write anot...

kees99 · 2024-06-20T09:42:04 1718876524

Not GP, but I would imagine "another checker to scan the results" would be another NN classifier.

Thinking being that you'd compare outputs of the two, and under assumption of the results being statistically independent from each other and of similar quality, say 1% difference between the two in said comparison, would suggest ~ 0.5% error rate from "ground truth".

TeMPOraL · 2024-06-20T09:42:38 1718876558

Maybe their problem is using LLM to solve f:X→Y, where the validation, V:{X,Y}→{0,1}, is trivial to compute?