A german example (from http://de.wikipedia.org/wiki/Hapax_legomenon) is "Knabenmorgenblütenträume" from Goethe's "Prometheus". It is a wonderful example for how you can create new words in German by concatenating existing words:
The literal translation of this word is somthing like "A boy's morning dreams about blossoms", but the true meaning (according to some internet discussions: http://www.gutefrage.net/frage/was-sind-knabenmorgenbluetent... ) seems to be something unfinished: Every part of the word depicts something unfinished / young.
It seems to work much better in german, when they are joined and feel much more like a new word.
As for English allowing the same thing though with spaces, isn't concatenating words to join their meaning a feature of pretty much every language?
> isn't concatenating words to join their meaning a feature of pretty much every language?
No, and that's the point.
The fact that the grammar of the language allows strings of nouns to form compound nouns (e.g. a "River Steamboat Captain" feels natural to me) is unique and interesting, and shared by English. In Spanish, for example, you'd need prepositions/conjunctions to express that concept.
The fact that if you were to write this down, you'd leave out the spaces, is just a weird quirk of the written system, and independent of the language itself.
My favorite example of English's willingness to do noun-noun compounding and Spanish's corresponding unwillingness is on the Boston subway:
Passenger emergency intercom unit at end of car
Sistema de intercomunicación para pasajeros en caso de emergencia situado al extremo del tren
There are several things going on there that make the Spanish longer than the English, but one is the obligatory use of explicit prepositions relating the nouns to one another in Spanish. In English terms, the Spanish says
System of intercommunication for passengers in case of emergency situated at the end of the train
See also this list written up by John Cowan of some languages that do noun-noun compounding and what the implicit meanings of such compounds can be:
As a person who got a degree in ancient languages (Hebrew, Greek and Aramaic), I'm excited to see this article up. Hapaxes are particularly interesting in ancient texts, because they bring up issues of understandability and translatability.
In particular the Bible is an interesting case. There are 1500 Hapaxes in the Hebrew Bible, but what's truly amazing is how many dis legomenon, tris legomenon, tetra, etc there are. The average English translation of the Hebrew Bible uses a diminished vocabulary, compared to the Hebrew, sometimes by a factor of 5. I think religious folk of all creeds would be interested to know how much subjective judgement goes into the translation of their sacred documents.
The reason is that hapax means "one time" in Greek, while legomenon means "that which is said". So, the noun is legomenon, which will be the word in plural.
A (not very good) analogy is "visible phenomenon". You would not say "visibles phenomenon", but "visible phenomena".
It's an interesting problem to try to determine what are the limits of a language and what is a word and what is not. Corpus studies are not sufficient for that purpose as you will always end up with a large number of hapaxes. Because language is based on social consensus, the most common sense approach to the problem would be to determine 'wordiness' of a string by checking how many people consider it to be a word.
We are trying to do something like this with large-scale studies for English and Dutch. As it is very related to the problem I will allow myself to share the links:
http://vocabulary.ugent.behttp://woordentest.ugent.be
I'm glad to know I'm not the only one whose favoured childhood reading was the "compact" OED in tinyprint. That and the '57 and '67 encyclopaedia britannicas - I spent a year (in the 80's) charting the progress of mankind's knowledge in that decade - effectively a manual diff.
Thinking about it, I now zyxt that this was probably strange.
I had thought Alice in Wonderland would have quite a few hapax legomena, even if mostly nonsense words I think (ie 'twas brillig and the slithy toves...), but I found from this academic article [0] that Twain's Tom Sawyer actually has 5% more hapax legomena. The article has some pretty surprising findings about ratios of hapax/vocabulary - though hapaxes fairly consistently make up around 50% of the words in any text, they steadily increase in corpora over 3,000,000 words.
>"The devoted dog of Montargis avenges the death of his
master, foully murdered in the Forest of Bondy; and a humorous
Peasant with a red nose and a very little hat, whom I take from this
hour forth to my bosom as a friend (I think he was a Waiter or an
Hostler at a village Inn, but many years have passed since he and I
have met), remarks that the sassigassity of that dog is indeed
surprising; and evermore this jocular conceit will live in my
remembrance fresh and unfading, overtopping all possible jokes,
unto the end of time."
Surely the joke is the peasant's mispronunciation of "sagacity"? Also an odd Baader-Meinhoff, that story about the dog was on the front of reddit last week.
But I'm wondering, if a word only appears one time, in a single book, how do we know it is a real word and not a new word invented by his author or a mistake?
Reminds me of an author who was given a brand-new Oxford English Dictionary (a 20-volume English dictionary with etymologies) by one of his fans. His wife was proof-reading a new manuscript when she came across a word that she was sure wasn't real English. He said "Oh, I'm sure it's in the dictionary!" so she went to look. A few minutes later she comes in, throws the volume at him, and storms out. Confused, he flips to the word in the volume and finds the etymology is... his earlier book!
This is a real problem, especially in Homeric Greek and Biblical Hebrew. One approach to figuring out what such words mean, is to go to similar words in related languages. Another is to use earlier translations (such as the Septuagint from Hebrew to Greek in the first or second centuries BCE), when the translators likely had a better sense of what the words meant.
If I am to interpret the wikipedia page, I think it's more common to talk about Hapax legomena within a single text -- or within a single author's work -- you could assert that a word was Hapax legomenon in the corpus of an whole language, but then you would have to have read the rest of the language's original recorded works, other people reusing the word from the original reference, and people overheard your own reference (assuming you broke the spell) without realizing they were breaking the spell.
How could you ever know if you were the one who ruined it, if it was or if it wasn't hapax when you first found it?
So interesting that this comes up today. We were discussing this yesterday in relation to an upcoming project where we want to maintain some amount of data for visitors to a website, but don't necessarily need to retain data for the visitors that we see very few times.
I think that might be impossible to sustain for any substantial length.
It's true, admittedly, such constraints are often embraced by authors seeking novelty; consider someone's novel written entirely without using the letter 'e'. This rule, though, seems excessive - constructions would grow increasingly baroque, English's famously large vocabulary stretched thin, meaning squirreled into obscure words, awkward transitions.
And yet, perhaps too hastily dismissing an idea is equally foolhardy. Exploring Borge's library one may, indeed, see everything in sufficient time...
Sometimes I really love my mother tongue :)