>I'm not sure if you're aware, but most of those papers are well known. All the ...

godelski · 2024-10-30T19:47:40 1730317660

  >  0 correlation is obviously false with how much denser the plot is at the extremes and depending on how many questions are in the test set, it could even be pretty strong.

I took it as hyperbole. And honestly I don't find that plot or much of the paper convincing. Though I have a general frustration in that it seems many researchers (especially NLP) willfully do not look for data spoilage. I know they do deduplication but I do question how many try to vet this by manual inspection. Sure, you can't inspect everything, but we have statistics for that. And any inspection I've done leaves me very unconvinced that there is no spoilage. There's quite a lot in most datasets I've seen, which can have a huge change in the interpretation of results. After all, we're elephant fitting

foobarqux · 2024-10-30T20:16:36 1730319396

I explicitly wrote "~0", and anyone who looks at that graph can say that there is no relationship at all in the data, except possibly at the extremes, where it doesn't matter that much (it "knows" sure things) and I'm not even sure of that. One of the reasons to plot data is so that this type of thing jumps out at you and you aren't misled by some statistic.