Probably it's something like "give feedback that's on average slightly more correct than incorrect," though you'd get more signal from perfect feedback.
That said, I suspect the signal is very weak even today and probably not too useful except for learning about human stylistic preferences.
That said, I suspect the signal is very weak even today and probably not too useful except for learning about human stylistic preferences.