Sure, let me try to explain. My model is built as follows:
1. People speak and write not by following rules but their ears.
2. Since the ear hears words in sequence, the interpretation of those words also occurs in sequence.
3. Thus the phrase “great green dragon” is interpreted first as “great _”, then as “great green _”, and then finally as “great green dragon”, with the hole “_” representing the expectation of something concrete to come, something that the mind’s eye can eventually see.
4. This expectation is satisfied only when each new word reduces the possibilities for what will fill the hole. With each new word, the hole must shrink toward a single point, and rapidly.
5. Therefore, the most-pleasing ordering of N modifiers will tend to be the one that most rapidly and evenly converges to a point.
6. Therefore, we can find the most-pleasing ordering by maximizing the geometric mean of the shrinkages for all steps in the sequence.
6. For fixed N, the geometric mean and the product are interchangeable for optimization purposes, so we can just maximize the product of the shinkages.
And that’s pretty much it.
Now to your questions:
By “impotent” I mean without effect. Any word that you put after a word that has already reduced the hole to a near-point is going to seem worthless and spoil the whole sequence.
Re. begging the question, a sequence isn’t effective because it’s effective; it’s effective because, given our mental probability models for word sequences, it rapidly and evenly converges upon a concrete interpretation.
To take your “darkest/greatest” as an example, here’s the rapid-and-even-convergence scores for the two possible orderings:
In information theory terms, if you ignore the current expectations of ordering and try to construct an ordering rule based on the information content of each ordering, all orderings have equal information content. Spreading out the information content so that it arrives a little bit with each word instead of in bursts and trickles doesn't get you anything, except that maybe the human brain can process it more easily in smaller chunks. Is this a theory?
This actually jives with the fact that languages with less information per phoneme (Japanese, Spanish) are spoken at a faster rate in phonemes per second than languages with more information per phoneme (Mandarin), so that most languages contain a similar information rate.
I don't know how to come up with a reasonable model for how people are likely to interpret words without accounting for the fact that people interpret words in large part by how they have been interpreted in the past. But I don't see how accounting for prior knowledge of word frequencies makes the model tautological.
In points 2-3. Obviously we're used to hearing things in a certain order, and tend to copy that, but the question is why we put things in a certain order to start with. How is a 'great X Y' any more concrete than a 'green X Y'? Obviously concrete things have size, but then abstractions aren't exactly known for their coloring either :-)
But "the question", as you offer it, is based on the fallacy that there is some underlying natural law to the way we use words. There isn't. There is no blank slate "to start with," no axioms from which everything else follows. Words have always been used in the context of how they have been used.
Therefore, any reasonable model of how people use words will have some representation of how people have used words. My model uses simple word frequencies, yes, but it doesn't predict that one modifier sequence will be preferred to another because "that's the way it is." Rather, it predicts a sequence will be more preferred because it more rapidly and more evenly converges on a meaning.
The original question was 'what is the rule for ordering adjectives [in English.]' The first thing you said was: How about this rule? Order the modifiers to maximize the product of their successive restrictive effects.
I'm saying that your observation people copy what they know is tautological, and tells us nothing whatever about the restrictive effects of a given adjective.
How is Order the modifiers to maximize the product of their successive restrictive effects equivalent to "people copy what they know"? The restrictive effects of a given adjective are an input to the model because that issue was never the question. The ordering was the question, and that's what my proposed model attempts to predict, given prior knowledge of restrictive effects. The interesting prediction it makes is that people prefer orderings that converge rapidly and evenly.
1. People speak and write not by following rules but their ears.
2. Since the ear hears words in sequence, the interpretation of those words also occurs in sequence.
3. Thus the phrase “great green dragon” is interpreted first as “great _”, then as “great green _”, and then finally as “great green dragon”, with the hole “_” representing the expectation of something concrete to come, something that the mind’s eye can eventually see.
4. This expectation is satisfied only when each new word reduces the possibilities for what will fill the hole. With each new word, the hole must shrink toward a single point, and rapidly.
5. Therefore, the most-pleasing ordering of N modifiers will tend to be the one that most rapidly and evenly converges to a point.
6. Therefore, we can find the most-pleasing ordering by maximizing the geometric mean of the shrinkages for all steps in the sequence.
6. For fixed N, the geometric mean and the product are interchangeable for optimization purposes, so we can just maximize the product of the shinkages.
And that’s pretty much it.
Now to your questions:
By “impotent” I mean without effect. Any word that you put after a word that has already reduced the hole to a near-point is going to seem worthless and spoil the whole sequence.
Re. begging the question, a sequence isn’t effective because it’s effective; it’s effective because, given our mental probability models for word sequences, it rapidly and evenly converges upon a concrete interpretation.
To take your “darkest/greatest” as an example, here’s the rapid-and-even-convergence scores for the two possible orderings:
So, without any knowledge that “horse” will follow, we can predict that “greatest darkest” will be the best ordering of the modifiers.(This is from a crude approximation of the model that I just whipped up. It uses a small database of 2-gram and 3-gram frequencies.)