That suggests that no statistical method could ever recover hidden representations though. And that’s patently untrue. Taken to its greatest extreme you shouldn’t even be able to guess between two mixed distributions even when they have wildly non-overlapping ranges. Or put another way, all of statistical testing in science is flawed.
I’m not saying you believe that, but I fail to see how that situation is structurally different from what you claim. If it’s a matter of degree, how do you feel things change as the situation becomes more complex?
Yes, I think most statistical testing in science is flawed.
But, to be clear, the reason it could ever work at all has nothing to do with the methods or the data itself, it has to do with the properties of the data generating process (ie., reality, ie., what's being measured).
You can never build representations from measurement data, this is called inductivism and it's pretty clearly false: no representation is obtained from just characterising measurement data. Theres no cases where I can think of that this would work -- temperature isnt patterns in thermometers; gravity isnt patterns in the positions of stars; and so on.
Rather you can decide between competing representations using stats in a few special cases. Stats never uncovers hidden representations, it can decide between different formal models which include such representations.
eg., if you characterise some system as having a power-law data generating process (eg., social network friendships), then you can measure some parameters of that process
or, eg., if you arrange all the data to already follow a law you know (eg., F=Gmm/r^2) then you can find G, 'statistically'.
This has caused a lot of confusion histroically: it seems G is 'induced over cases', but all the representaiton work has alerady been done. Stats/induction just plays the role of fine-tuning known representatios. it never builds any
Okay, I think I follow and agree legalistically with your argument. But I also think it basically only exists philosophically. In practice, we make these determinations all the time. I don't see any reason why a sufficiently sophisticated representation, learned through statistical optimization, is, in practice, different from a semantic model.
If there were such a thing, it'd be interesting to propose how our own minds, at least to the degree that they can be seen as statistical learners in their own right, achieve semantics. And how that thing, whatever it might be, is not itself a learned representation driven by statistical impression.
We arent statistical learners. We're abductive learners.
We move, and in moving, grow representations in our bodies. These representations are abstracted in cognition, and form the basis for abductive explanations of reality.
We leave plato's cave by building vases of our own, inside the cave, and comparing them to shadows. We do not draw outlines around the shadows.
This is all non-experimental 'empirical' statistics is: pencil marks on the cave wall.
If someone else crafted an experiment, and you were informed of it and then shown the results, if this was done repeatedly enough, would you be incapable of forming any sort of semantic meaning?
The meaning of the measures is determined by the experiment, not by the data. "Data" is itself meaningless, and statistics on data is only informative of reality because of how the experimenter creates the measurement-target relationship.
Okay I think I buy that. I don’t know if I agree, but trying to argue for a position against it has been sufficiently illuminating that I just need to chew on it more.
There’s no doubt in my mind that experimental learning is more efficient. Especially if you can design the experiments against your personal models at the time.
At the same time, it’s not clear to me that one could not gain similar value purely by, say, reading scientific journals. Or observing videos of the experiments.
At some point the prevalence of “natural experiments” becomes too low for new discover through. We weren’t going to accidentally discover an LHC hanging around. We needed giant telescopes to find examples of natural cosmological experiments. Without a doubt, thoughtful investment in experimentation becomes necessary as you push your knowledge frontier forward.
But within a realm where tons of experimental data is just available? Seems very likely that a learner asked to predict new experimental results outside of things they’ve directly observed but well within the space of models they’ve observed lots of experimentation around should still find that purely as an act of compression, their statistical knowledge would predict something equivalent to the semantic theory underlying it.
We even seemed to observe just this in multimodal GPT-4 where it can theorize about the immediate consequences of novel physical situations depicted in images. I find it to be weak but surprising evidence of this behavior.
I'd be interested in the GPT-4 case, if you have a paper (etc.) ?
You are correct to observe that science, as we know it, is ending. We're way along the sigmoid of what can be known, and soon enough, will be drifting back into medieval heuristics ("this weed seems to treat this disease").
This isnt a matter of efficiency, it's a necessity. Reality is under-determined by measurement; to find out what it is like, we have to have many independent measures whose causal relationship to reality is one we can control (through direct, embodied, action).
If we only have observational measures, we're trapped in a madhouse.
Let's not mistake science for pseudoscience, even if the future is largely now, pseudoscientific trash.
I thought the examples I was thinking of were in the original GPT-4 Technical Report, but all I found on re-reading were examples of it explaining "what's funny about" a given image. Which is still a decent example of this, I think. GPT-4 demonstrates a semantic model about what entails humor.
it entails only that the associative model is coincidentally indinstiguishable from a semantic one in the cases where it's used
it is always trivial to take one of these models and expose it's failure to operate semantically, but these cases are never in the marketing material.
Consider an associative model of addition, all numbers from -1bn to 1bn, broken down into their digits, so that 1bn = <1, 0, 0, 0, 0, 0, 0, 0, 0>
Using such a model you can get the right answers for more additions than just -1bn to 1bn, but you can also easily find cases where the addition would fail.
I think part of what I suspect is going on here too is more computation and finiteness. It seems correct that LLM architectures cannot perform too much computation (unless you unroll it in the context).
On the other hand you can look at statistical model identification in, say, nonlinear control. This can absolutely lead to unboundedly long predictions.
I’m not saying you believe that, but I fail to see how that situation is structurally different from what you claim. If it’s a matter of degree, how do you feel things change as the situation becomes more complex?