Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hapax legomenon (wikipedia.org)
87 points by mutor on March 6, 2014 | hide | past | favorite | 40 comments


A german example (from http://de.wikipedia.org/wiki/Hapax_legomenon) is "Knabenmorgenblütenträume" from Goethe's "Prometheus". It is a wonderful example for how you can create new words in German by concatenating existing words:

    Knabe = Boy
    Morgen = Morning
    Blüte = Blossom
    Träume = Dreams
The literal translation of this word is somthing like "A boy's morning dreams about blossoms", but the true meaning (according to some internet discussions: http://www.gutefrage.net/frage/was-sind-knabenmorgenbluetent... ) seems to be something unfinished: Every part of the word depicts something unfinished / young.

Sometimes I really love my mother tongue :)


Technically English allows the same, only that compound words are separated by spaces instead of either concatenated or joined with hyphens.


Word pairings that take off in usage gradually migrate from separated by spaces to separated by hyphens to compounded. E.g.

    data base
    data-base
    database


It seems to work much better in german, when they are joined and feel much more like a new word. As for English allowing the same thing though with spaces, isn't concatenating words to join their meaning a feature of pretty much every language?


> isn't concatenating words to join their meaning a feature of pretty much every language?

No, and that's the point.

The fact that the grammar of the language allows strings of nouns to form compound nouns (e.g. a "River Steamboat Captain" feels natural to me) is unique and interesting, and shared by English. In Spanish, for example, you'd need prepositions/conjunctions to express that concept.

The fact that if you were to write this down, you'd leave out the spaces, is just a weird quirk of the written system, and independent of the language itself.


My favorite example of English's willingness to do noun-noun compounding and Spanish's corresponding unwillingness is on the Boston subway:

Passenger emergency intercom unit at end of car

Sistema de intercomunicación para pasajeros en caso de emergencia situado al extremo del tren

There are several things going on there that make the Spanish longer than the English, but one is the obligatory use of explicit prepositions relating the nouns to one another in Spanish. In English terms, the Spanish says

System of intercommunication for passengers in case of emergency situated at the end of the train

See also this list written up by John Cowan of some languages that do noun-noun compounding and what the implicit meanings of such compounds can be:

http://recycledknowledge.blogspot.com/2009/12/noun-noun-comp...

Notice that not all types of compounds are understood in every language, even for languages that sometimes allow this!


Thanks, that was a great example.


Hyphen concatenation is also allowed in German, and widely used.


As a person who got a degree in ancient languages (Hebrew, Greek and Aramaic), I'm excited to see this article up. Hapaxes are particularly interesting in ancient texts, because they bring up issues of understandability and translatability.

In particular the Bible is an interesting case. There are 1500 Hapaxes in the Hebrew Bible, but what's truly amazing is how many dis legomenon, tris legomenon, tetra, etc there are. The average English translation of the Hebrew Bible uses a diminished vocabulary, compared to the Hebrew, sometimes by a factor of 5. I think religious folk of all creeds would be interested to know how much subjective judgement goes into the translation of their sacred documents.


So Googlewhacks are hapax legomena of the open internet corpus. I cannot wait to utter that phrase at a party.

[Edit: Seems the plural is hapax legomena, not hapaxes legomenon.]


The reason is that hapax means "one time" in Greek, while legomenon means "that which is said". So, the noun is legomenon, which will be the word in plural.

A (not very good) analogy is "visible phenomenon". You would not say "visibles phenomenon", but "visible phenomena".


Yes - for some reason I instinctively grouped it with culs-de-sac and mothers-in-law.


It's an interesting problem to try to determine what are the limits of a language and what is a word and what is not. Corpus studies are not sufficient for that purpose as you will always end up with a large number of hapaxes. Because language is based on social consensus, the most common sense approach to the problem would be to determine 'wordiness' of a string by checking how many people consider it to be a word.

We are trying to do something like this with large-scale studies for English and Dutch. As it is very related to the problem I will allow myself to share the links: http://vocabulary.ugent.be http://woordentest.ugent.be


As a youth I was fascinated to discover a dethroned hapax legomenon in a detective novel:

http://lee-phillips.org/literallyEgregious/


I'm glad to know I'm not the only one whose favoured childhood reading was the "compact" OED in tinyprint. That and the '57 and '67 encyclopaedia britannicas - I spent a year (in the 80's) charting the progress of mankind's knowledge in that decade - effectively a manual diff.

Thinking about it, I now zyxt that this was probably strange.


That's a very nice article. Really makes you think about the way languages and the meanings of words evolve over time.


I had thought Alice in Wonderland would have quite a few hapax legomena, even if mostly nonsense words I think (ie 'twas brillig and the slithy toves...), but I found from this academic article [0] that Twain's Tom Sawyer actually has 5% more hapax legomena. The article has some pretty surprising findings about ratios of hapax/vocabulary - though hapaxes fairly consistently make up around 50% of the words in any text, they steadily increase in corpora over 3,000,000 words.

[0] http://aclweb.org/anthology/J/J10/J10-4003.pdf


Love the part on "Sassigassity": A word that appears Dickens' short story "A Christmas Tree", and it seems that no one knows what it means.


>"The devoted dog of Montargis avenges the death of his master, foully murdered in the Forest of Bondy; and a humorous Peasant with a red nose and a very little hat, whom I take from this hour forth to my bosom as a friend (I think he was a Waiter or an Hostler at a village Inn, but many years have passed since he and I have met), remarks that the sassigassity of that dog is indeed surprising; and evermore this jocular conceit will live in my remembrance fresh and unfading, overtopping all possible jokes, unto the end of time."

Surely the joke is the peasant's mispronunciation of "sagacity"? Also an odd Baader-Meinhoff, that story about the dog was on the front of reddit last week.


But I'm wondering, if a word only appears one time, in a single book, how do we know it is a real word and not a new word invented by his author or a mistake?


What, pray tell, is the difference between a real word and a new word invented?


Reminds me of an author who was given a brand-new Oxford English Dictionary (a 20-volume English dictionary with etymologies) by one of his fans. His wife was proof-reading a new manuscript when she came across a word that she was sure wasn't real English. He said "Oh, I'm sure it's in the dictionary!" so she went to look. A few minutes later she comes in, throws the volume at him, and storms out. Confused, he flips to the word in the volume and finds the etymology is... his earlier book!


This is a real problem, especially in Homeric Greek and Biblical Hebrew. One approach to figuring out what such words mean, is to go to similar words in related languages. Another is to use earlier translations (such as the Septuagint from Hebrew to Greek in the first or second centuries BCE), when the translators likely had a better sense of what the words meant.


I believe most hapax legomena to be cromulent.


Are you just trying to embiggen the cromulence around here?


I love the irony that listing hapaxes for the English language, in effect, nullifies their hapax status.


If I am to interpret the wikipedia page, I think it's more common to talk about Hapax legomena within a single text -- or within a single author's work -- you could assert that a word was Hapax legomenon in the corpus of an whole language, but then you would have to have read the rest of the language's original recorded works, other people reusing the word from the original reference, and people overheard your own reference (assuming you broke the spell) without realizing they were breaking the spell.

How could you ever know if you were the one who ruined it, if it was or if it wasn't hapax when you first found it?


This was the answer to a sub-sub-puzzle of the MIT mystery hunt 2012 http://web.mit.edu/puzzle/www/2012/puzzles/into_the_woodstoc...


So interesting that this comes up today. We were discussing this yesterday in relation to an upcoming project where we want to maintain some amount of data for visitors to a website, but don't necessarily need to retain data for the visitors that we see very few times.


I think this discussion is incomplete without any reference to the word "Gundible."

[]: cloud.github.com/downloads/shoes/shoes/nks.pdf‎ (Read the introduction)


Quite interesting - the concept is commonly used in natural language processing, but I've never seen it called by that greek term.


Is there a book where each word is used only once?


I think that might be impossible to sustain for any substantial length.

It's true, admittedly, such constraints are often embraced by authors seeking novelty; consider someone's novel written entirely without using the letter 'e'. This rule, though, seems excessive - constructions would grow increasingly baroque, English's famously large vocabulary stretched thin, meaning squirreled into obscure words, awkward transitions.

And yet, perhaps too hastily dismissing an idea is equally foolhardy. Exploring Borge's library one may, indeed, see everything in sufficient time...


I see what you did there.


Bravo.




I was going to say a telephone book.. but I think it should be like the BBC's Just A Minute show, but in book form.


Heller: polymesmeric. Does it count if it's used as a byline on the cover, I wonder.


I can just hear Hermione Granger saying "Its 'legoMEnon'"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: