It’s going to be very difficult to come up with any rigorous structure for automatically assessing the outputs of these models. They’re built using effectively human grading of the answers
hmmh, if we have the reinforcement learning part of reinforcement learning with human feedback, isn't that a model that takes a question/answer pair and rates the quality of the answer? it's sort of grading itself, it's like a training loss but it still tells us something?