Okay, I think I follow and agree legalistically with your argument. But I also t...

mjburgess · on March 30, 2024

We arent statistical learners. We're abductive learners.

We move, and in moving, grow representations in our bodies. These representations are abstracted in cognition, and form the basis for abductive explanations of reality.

We leave plato's cave by building vases of our own, inside the cave, and comparing them to shadows. We do not draw outlines around the shadows.

This is all non-experimental 'empirical' statistics is: pencil marks on the cave wall.

tel · on March 30, 2024

So we craft experiments.

If someone else crafted an experiment, and you were informed of it and then shown the results, if this was done repeatedly enough, would you be incapable of forming any sort of semantic meaning?

mjburgess · on March 31, 2024

If they only showed the measures, yes.

The meaning of the measures is determined by the experiment, not by the data. "Data" is itself meaningless, and statistics on data is only informative of reality because of how the experimenter creates the measurement-target relationship.

tel · on March 31, 2024

Okay I think I buy that. I don’t know if I agree, but trying to argue for a position against it has been sufficiently illuminating that I just need to chew on it more.

There’s no doubt in my mind that experimental learning is more efficient. Especially if you can design the experiments against your personal models at the time.

At the same time, it’s not clear to me that one could not gain similar value purely by, say, reading scientific journals. Or observing videos of the experiments.

At some point the prevalence of “natural experiments” becomes too low for new discover through. We weren’t going to accidentally discover an LHC hanging around. We needed giant telescopes to find examples of natural cosmological experiments. Without a doubt, thoughtful investment in experimentation becomes necessary as you push your knowledge frontier forward.

But within a realm where tons of experimental data is just available? Seems very likely that a learner asked to predict new experimental results outside of things they’ve directly observed but well within the space of models they’ve observed lots of experimentation around should still find that purely as an act of compression, their statistical knowledge would predict something equivalent to the semantic theory underlying it.

We even seemed to observe just this in multimodal GPT-4 where it can theorize about the immediate consequences of novel physical situations depicted in images. I find it to be weak but surprising evidence of this behavior.

mjburgess · on April 1, 2024

I'd be interested in the GPT-4 case, if you have a paper (etc.) ?

You are correct to observe that science, as we know it, is ending. We're way along the sigmoid of what can be known, and soon enough, will be drifting back into medieval heuristics ("this weed seems to treat this disease").

This isnt a matter of efficiency, it's a necessity. Reality is under-determined by measurement; to find out what it is like, we have to have many independent measures whose causal relationship to reality is one we can control (through direct, embodied, action).

If we only have observational measures, we're trapped in a madhouse.

Let's not mistake science for pseudoscience, even if the future is largely now, pseudoscientific trash.

tel · on April 1, 2024

I thought the examples I was thinking of were in the original GPT-4 Technical Report, but all I found on re-reading were examples of it explaining "what's funny about" a given image. Which is still a decent example of this, I think. GPT-4 demonstrates a semantic model about what entails humor.

mjburgess · on April 1, 2024

it entails only that the associative model is coincidentally indinstiguishable from a semantic one in the cases where it's used

it is always trivial to take one of these models and expose it's failure to operate semantically, but these cases are never in the marketing material.

Consider an associative model of addition, all numbers from -1bn to 1bn, broken down into their digits, so that 1bn = <1, 0, 0, 0, 0, 0, 0, 0, 0>

Using such a model you can get the right answers for more additions than just -1bn to 1bn, but you can also easily find cases where the addition would fail.

It's never adding.

tel · on April 2, 2024

I think part of what I suspect is going on here too is more computation and finiteness. It seems correct that LLM architectures cannot perform too much computation (unless you unroll it in the context).

On the other hand you can look at statistical model identification in, say, nonlinear control. This can absolutely lead to unboundedly long predictions.