This does not mean that language in humans isn't probabilistic in nature. You seem to think that because there is structure then it must be rule based but that doesn't follow at all.
When a group of birds fly, each bird discovers/knows that flying just a little behind another will reduce the amount of flaps it needs to fly. When you have nearly every bird doing this, the flock form an interesting shape.
'Birds fly in a V shape' is essentially what grammar is here - a useful fiction of the underlying reality. There is structure. There is meaning but there is no rule the birds are following to get there. No invisible V shape in the sky constraining bird flight.
First, there is no evidence of any probabilistic processing at the level of syntax in humans (it's irrelevant what computers can do).
Second, I didn't say that, in language, structure implies deterministic rules, I said that there is a deterministic rule that involves the structure of a sentence. Specifically, sentences are interpreted according to their parse tree, not the linear order of words.
As for the birds analogy, the "rules" the birds follow actually does explain the V-shape that the flock forms. You make an observation "V-shaped flock" ask the question "why a V-shape and not some other shape" and try to find a explanation (the relative bird positions make it easier to fly [because of XYZ]). In the case of language you observe that there is structure dependence, you ask why it's that way and not another (like linear order) and try to come up with an explanation. You are trying to suggest that the observation that language has structure dependence is like seeing an image of an object in a cloud formation: an imagined mental projection that doesn't have any meaningful underlying explanation. You could make the same argument for pretty much anything (e.g. the double-slit experiment is just projecting some mental patterns onto random behavior) and I don't think it's a serious argument in this case either.
And research on syntactic surprisal—where more predictable syntactic structures are processed faster—shows a strong correlation between the probability of a syntactic continuation and reading times.
>In the case of language you observe that there is structure dependence, you ask why it's that way and not another (like linear order) and try to come up with an explanation. You are trying to suggest that the observation that language has structure dependence is like seeing an image of an object in a cloud formation: an imagined mental projection that doesn't have any meaningful underlying explanation.
No I'm suggesting that all you're doing here is cooking up some very nice fiction like Newton did when he proposed his model of gravity. Grammar does not even fit into rule based hierarchies all that well. That's why there are a million strange exceptions to almost every 'rule'. Exceptions that have no sensible explanations beyond, 'well this is just how it's used' because of course that's what happens when you try to break down an inherently probabilistic process into rigid rules.
> And research on syntactic surprisal—where more predictable syntactic structures are processed faster—shows a strong correlation between the probability of a syntactic continuation and reading times.
I'm not sure what this is supposed to show? If I can predict what you are going to say so what. I can predict you are going to pick something up too if you are looking at it and start moving your arm. So what?
The third paper looks like a similar argument. As far as I can tell neither paper 1 or 2 propose a probabilistic model for language. 1 talks about how certain language features are acquired faster with more exposure (that isn't inconsistent with a deterministic grammar). I believe 2 is the same.
> No I'm suggesting that all you're doing here is cooking up some very nice fiction like Newton did when he proposed his model of gravity.
Absolutely bonkers to describe Newton's model of gravity as "fiction". In that sense every scientific breakthrough is fiction: Bohr's model of the atom is fiction (because it didn't use quantum effects), Einstein's gravity will be fiction too when physics is unified with quantum gravity. No sane person uses the word "fiction" to describe any of this, it's just scientific refinement: we go from good models to better ones, patching up holes in our understanding, which is an unceasing process. It would be great if we could have a Newton-level "fictitious" breakthrough in language.
> Grammar does not even fit into rule based hierarchies all that well. That's why there are a million strange exceptions to almost every 'rule'. Exceptions that have no sensible explanations beyond, 'well this is just how it's used' because of course that's what happens when you try to break down an inherently probabilistic process into rigid rules.
No one is saying grammar has been solved, people are trying to figure out all the things that we don't understand.
>I'm not sure what this is supposed to show? If I can predict what you are going to say so what.
If the speed of your understanding varies with how frequent and predictable syntactic structures are then your understanding of syntax is a probabilistic process. A strictly non-probabilistic process would have a fixed, deterministic way of processing syntax, independent of how often a structure appears or how predictable it is.
>I can predict you are going to pick something up too if you are looking at it and start moving your arm. So what?
Ok ? This is very interesting. Do you seriously think this prediction right now isn't probabilistic ? You estimate not from rigid rules but past experience that it's likely I will pick it up. What if i push it off the table ? You think that isn't possible? What if i grab the knife in my bag while you're distracted and stab you instead? Probability is the reason you picked that option instead of the myriad of options.
>Absolutely bonkers to describe Newton's model of gravity as "fiction". In that sense every scientific breakthrough is fiction: Bohr's model of the atom is fiction (because it didn't use quantum effects), Einstein's gravity will be fiction too when physics is unified with quantum gravity. No sane person uses the word "fiction" to describe any of this, it's just scientific refinement: we go from good models to better ones, patching up holes in our understanding, which is an unceasing process. It would be great if we could have a Newton-level "fictitious" breakthrough in language.
"All models are wrong. Some are useful" - George Box.
There's nothing insane with calling a spade a spade. It is fiction and many academics do view it in such a light. It's useful fiction, but fiction none the less. And yes, Einstein's theory is more useful fiction.
Grammar is a model of language. It is not language.
> If the speed of your understanding varies with how frequent and predictable syntactic structures are then your understanding of syntax is a probabilistic process.
In what sense? I don't see how it tells you anything if you have the sentence "The cat ___ " and then you expect a verb like "went" but you could get a relative clause like "that caught the mouse". The sentence is interpreted deterministically not by what what follows after a fragment might contain but what it does contain. If you are more "surprised" by the latter it doesn't tell you that the process is not deterministic.
> Ok ? This is very interesting. Do you seriously think this prediction right now isn't probabilistic ? You estimate not from rigid rules but past experience that it's likely I will pick it up. What if i push it off the table ? You think that isn't possible. What if i grab the gun in my bag while you're distracted and shoot you instead?
I think you are confusing multiple things. I can predict actions and words, that doesn't mean sentence parsing/production is probabilistic (I'm not even sure exactly what a person might mean by that, especially with respect to production) nor does it mean arm movement is.
> "All models are wrong. Some are useful" - George Box. There's nothing insane with calling a spade a spade. It is fiction and many academics do view it in such a light. It's useful fiction, but fiction none the less. And yes, Einstein's theory is more useful fiction. Grammar is a model of language. It is not language.
I have no idea what you are saying: calling grammar a "fiction" was supposed to be a way to undermine it but now you are saying that it was some completely trivial statement that applies to the best science?
>In what sense? I don't see how it tells you anything if you have the sentence "The cat ___ " and then you expect a verb like "went" but you could get a relative clause like "that caught the mouse". The sentence is interpreted deterministically not by what what follows after a fragment might contain but what it does contain. If you are more "surprised" by the latter it doesn't tell you that the process is not deterministic.
The claim isn't about whether the ultimate interpretation is deterministic-it’s about the process of parsing and expectation-building as the sentence unfolds.
The idea is that language processing (at least in humans and many computational models) involves predictions about what structures are likely to come next. If the brain (or a model) processes common structures more quickly and experiences more difficulty and higher processing times with less frequent ones, then the process of parsing sentences is very clearly probabilistic.
Being "surprised" isn't just a subjective experience here - it manifests as measurable processing costs that scale with the degree of unexpectedness. This graded response to probability is not explainable with purely deterministic models that would parse every sentence with the same algorithm and fixed steps.
>I have no idea what you are saying: calling grammar a "fiction" was supposed to be a way to undermine it but now you are saying that it was some completely trivial statement that applies to the best science?
None of my comments undermine grammar beyond saying it is not how language works. I preface 'fiction' with the word useful multiple times and make comparisons to Newton.
> If the brain (or a model) processes common structures more quickly ... then the process of parsing sentences is very clearly probabilistic.
This isn't true. For one more common sentences are probably structurally simpler and structurally simpler sentences are faster to process. You also get in bizarre territory when you can predict what someone is going to say before they say it: Obviously no "parsing" has occurred there so the fact that you predicted it cannot be evidence that parsing is probabilistic. If that is the case then a similar argument is true if you have only a sentence fragment. The probabilistic prediction is some ancillary process just as if I can predict that a cup is going to fall doesn't make my vision a probabilistic process in any meaningful sense. If for some reason I couldn't predict I could still see and I could still parse sentences.
Furthermore, you can obviously parse sentences and word sequences you have never seen before (and sentences can be arbitrarily complex/nested, at least up to your limits on memory). You can also parse sentences with invented terms.
Most importantly it's not clear how sentences are produced in the mind in this model. Is the claim that you somehow start with a word and produce some random most-likely next word? Do you not believe in syntax parse trees?
Finally, (as Chomsky points out in the video I linked) this model doesn't account for structure dependence. For example why is the question form of the sentence "The man who is tall is happy" "Is the man who is tall happy?" and not "is the man who tall is happy?". Why not move the first "is" that you come across?
> In a strictly deterministic model, both continuations ("went" or "that caught the mouse") would be processed through the same fixed algorithm with the same computational steps, regardless of frequency. The parsing mechanism wouldn't be influenced by prior expectations
Correct. You seem to imply that is somehow unreasonable. Computer parsers work this way.
> Being "surprised" isn't just a subjective experience here - it manifests as measurable processing costs that scale with the degree of unexpectedness. This graded response to probability is not explainable with purely deterministic models.
Again, there are two orthogonal concepts: Do I know what you are going to say next or how you are going to finish your sentence (and possibly something like strain or slowed processing when faced with an unusual concept) and what process do I use to interpret the thing you actually said.
> None of my comments undermine grammar beyond saying it is not how language works. I preface 'fiction' with the word useful multiple times and make comparisons to Newton.
Again, I have no idea what the point of describing universal grammar as fiction is if you say the term applies to all other great scientific theories.
>This isn't true. For one more common sentences are probably structurally simpler and structurally simpler sentences are faster to process.
Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.
>You also get in bizarre territory when you can predict what someone is going to say before they say it: Obviously no "parsing" has occurred there so the fact that you predicted it cannot be evidence that parsing is probabilistic.
Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic.
>Furthermore, you can obviously parse sentences and word sequences you have never seen before (and sentences can be arbitrarily complex/nested, at least up to your limits on memory). You can also parse sentences with invented terms.
So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.
>Most importantly it's not clear how sentences are produced in the mind in this model. Is the claim that you somehow start with a word and produce some random most-likely next word? Do you not believe in syntax parse trees?
That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?
>Finally, (as Chomsky points out in the video I linked) this model doesn't account for structure dependence. For example why is the question form of the sentence "The man who is tall is happy" "Is the man who is tall happy?" and not "is the man who tall is happy?". Why not move the first "is" that you come across?
Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?
You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre
>Correct. You seem to imply that is somehow unreasonable. Computer parsers work this way.
I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.
>Again, I have no idea what the point of describing universal grammar as fiction is if you say the term applies to all other great scientific theories
What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.
>Again, there are two orthogonal concepts: Do I know what you are going to say next or how you are going to finish your sentence (and possibly something like strain or slowed processing when faced with an unusual concept) and what process do I use to interpret the thing you actually said.
The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked
> "Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic."
This is just redefining terms to be so vague as to make rationality inquiry or discussion impossible. I don't know what re-definition of parsing you could be using that would still be in any way useful or to what "probabilistic" in that case is supposed to apply to.
If you are saying that the brain is constantly predicting various things so that it automatically imbues some process that doesn't involve prediction as probabilistic then that is just useless.
> Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.
Well, I'll have to take your word for it as you haven't cited the paper but I would point to the reasonable explanation of different processing times that has nothing to do with parsing I gave further below. But I will repeat the vision analogy: If I had an experiment that showed that I took longer to react to an unusual visual sequence we would not immediately conclude that the visual system was probabilistic. The more parsimonious explanation is that the visual system is deterministic and some other part of cognition takes longer (or is recomputed) because of the "surprise".
> So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.
It's not about capturing it in a statistics or having an LLM produce it, it's about explaining why that rule occurs and not some other. That's the difference between explanation and description.
> That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?
Because producing one token at a time cannot produce arbitrary recursive structures like sentences can be? Because no language uses linear order? Because when we express a thought it usually can't be reduced to a single start word and statistically most-likely next word continuations? It's also irrelevant what computers do, we are talking about what humans do.
> Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?
That isn't the question. The question is why it's that way and not another. It's as if I ask why do the planets move in a certain pattern and you respond with "well why does my deep-neural-net predict it so well?". It's just nonsense.
> You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre
No probabilistic model has explained anything. You are confusing predicting with explaining.
> I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.
I explained why you would expect that to be the case even with deterministic processing.
> What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.
Well as I said this is also true of Einstein's theory of gravity and you presumably brought up the point to contrast universal grammar with that theory rather than point out the similarities.
> The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked
The brain is doing lots of things, we are talking about the language system. Again, if instead we were talking about the visual system no one would dispute that the visual system is doing the "seeing" and other parts of the brain are doing predicting.
In fact they must be orthogonal because once you get to the end of the sentence, where there are no next words to predict, you can still parse it even if all your predictions were wrong. So the main deterministic processing bit (universal grammar) still needs to be explained and the ancillary next-word-prediction "probabilistic" part is not relevant to its explanation.
When a group of birds fly, each bird discovers/knows that flying just a little behind another will reduce the amount of flaps it needs to fly. When you have nearly every bird doing this, the flock form an interesting shape.
'Birds fly in a V shape' is essentially what grammar is here - a useful fiction of the underlying reality. There is structure. There is meaning but there is no rule the birds are following to get there. No invisible V shape in the sky constraining bird flight.