For the red Lambada line in Fig 13 when the model predicts ~0 the ground truth i...

og_kalu · 2024-11-02T21:51:17 1730584277

>For the red Lambada line in Fig 13 when the model predicts ~0 the ground truth is 0.7. No one can look at that line and say there is a meaningful relationship. The Py Func Synthesis line also doesn't look good above 0.3-0.4.

Yeah but Lambada is not the only line there.

>Out of distribution is the only test that matters. If it doesn't work out of distribution it doesn't work. Surely you know that.

Train the classifier on math questions and get good calibration for math, train the classifier on true/false questions and get good calibration for true/false, train the train the classifier on math but struggle with true/false (and vice versa). This is what "out-of-distribution" is referring to here.

Make no mistake, the fact that both the first two work is evidence that models encode some knowledge about the truthfulness of their responses. If they didn't, it wouldn't work at all. Statistics is not magic and gradient descent won't bring order where there is none.

What out of distribution "failure" here indicates is that "truth" is multifaceted and situation dependent and interpreting the models features is very difficult. You can't train a "general LLM lie detector" but that doesn't mean model features are unable to provide insight into whether a response is true or not.

foobarqux · 2024-11-03T00:40:29 1730594429

> Well good thing Lambada is not the only line there.

There are 3 out-of-distribution lines, all of them bad. I explicitly described two of them. Moreover, it seems like the worst time for your uncertainty indicator to silently fail is when you are out of distribution.

But okay, forget about out-of-distribution and go back to Figure 12 which is in-distribution. What relationship are you supposed to take away from the left panel? From what I understand they were trying to train a y=x relationship but as I said previously the plot doesn't show that.

An even bigger problem might be the way the "ground truth" probability is calculated: they sample the model 30 times and take the percentage of correct results as ground truth probability, but it's really fishy to say that the "ground truth" is something that is partly an internal property of the model sampler and not of objective/external fact. I don't have more time to think about this but something is off about it.

All this to say that reading long scientific papers is difficult and time-consuming and let's be honest, you were not posting these links because you've spent hours poring over these papers and understood them, you posted them because the headlines support a world-view you like. As someone else noted you can find good papers that have opposite-concluding headlines (like the work of rao2z).