If someone types English for a minute at 120WPM then they’ll have produced about 600 bits of information.
Are you saying we should consider the rate in a smaller window of time? Or we should consider the rate when the typist is producing a series of unrelated English words that don’t form a coherent sentence?
Take for example a human typist working from a hand-written manuscript. An advanced typist produces 120 words per minute. If each word is taken as 5 characters, this typing speed corresponds to 10 keystrokes a second. How many bits of information does that represent? One is tempted to count the keys on the keyboard and take the logarithm to get the entropy per character, but that is a huge overestimate. Imagine that after reading the start of this paragraph you are asked what will be the next let…
English contains orderly internal structures that make the character stream highly predictable. In fact, the entropy of English is only ∼ 1 bit per character [1]. Expert typists rely on all this redundancy: if forced to type a random character sequence, their speed drops precipitously.
[1] Shannon CE. Prediction and Entropy of Printed English. Bell System Technical Journal. 1951;30(1):50-64.
You show a bunch of English speakers some text that’s cut off, and ask them to predict the next letter. Their success at prediction tells you the information content of the text. Shannon ran this experiment and got a result of about 1 bit per letter: https://archive.org/details/bstj30-1-50/page/n5/mode/1up
OK. When talking about language I find it's always good to be explicit about what level you're talking about, especially when you're using terms as overloaded as "information". I'm not really sure how to connect this finding to semantics.
If the text can be reproduced with one bit per letter, then the semantic information content is necessarily at most equal to N bits where N is the length of the text in letters. Presumably it will normally be much less, since there are things like synonyms and equivalent word ordering which don’t change the meaning, but this gives a solid upper bound.
If someone types English for a minute at 120WPM then they’ll have produced about 600 bits of information.
Are you saying we should consider the rate in a smaller window of time? Or we should consider the rate when the typist is producing a series of unrelated English words that don’t form a coherent sentence?