Hacker News new | past | comments | ask | show | jobs | submit login
What Chinese looks, feels and sounds like when you're from Korea or Japan (2009) (pagef30.com)
262 points by _mt3y on Oct 13, 2020 | hide | past | favorite | 183 comments



Native Chinese speaker here, and this is how I view Japanese before/after learning Japanese:

Before learning: Japanese is some hanzi/kanji characters and words in tons of unknown characters. To try to grasp a meaning from a Japanese sentence, just remove all the hiragana and katakana and try to make sense from the remaining words. I found that it is similar to reading Classical Chinese.

After learning kana: Could understand more things about a sentence by です ない か... Also, can read some katakana words if it is originated from English.

Actually learning Japanese and start memorizing Japanese words: Grammar is quite different, but still can find some sentence structures that are similar. Such as …も…ない and …也…沒… It is easy to find that 訓読み words have similar pronunciation with some Chinese variants. It is usually more similar to Minnan(Hokkien), Cantonese... compared to Mandarin.

IMO, knowing Chinese do make me easier to learn and read Japanese. I am still trying to get JLPT N3 this year, so other people knows more about Japanese may have different experience.


音読み is the sound based on the original Chinese pronounciation, and 訓読み is the Japanese pronounciation.


This is interesting; I studied Korean in college / lived there for a bit. Randomly, met a Chinese woman there, who actually lived in the States, and we got married after we both moved back to the States.

If I could do it over again, I would learn Chinese before Korean or Japanese. My Chinese friends who learned Korean learned it way faster than any western learner.

Japanese students learned Korean even quicker than the Chinese students - in addition to using Chinese words, Japanese and Korean have the same grammar structure ( Subject Object Verb ) whereas Chinese and English are different (Subject Verb Object)

Learning Chinese is like learning Latin before learning a Romance language.

The added benefit is that Chinese isn’t a dead language.

Both Japanese and Korean are distinct from Chinese and beautiful languages (I really do love Korean, such a pretty, logical language), but China has been an influence in the region for so long they can’t have help absorbing and using pieces of the Chinese languages.

Fun random fact: a lot of Korean words are built on Chinese words; however, some of those words were taken from Chinese thousands of years ago.

The Chinese pronunciations have changed. Some researchers have been known to use Korean pronunciations of words to help triangulate to what ancient Chinese characters would sound like (in combination with poems where you know what rhymes to expect etc)!


I totally agree with this analogy, though maybe to add some nuance (or pedantry), I'd claim that to start learning Chinese, you need to pick a modern dialect (probably Mandarin), which is like choosing Spanish over Portuguese or Catalan. Then, learning Japanese or Korean (or Vietnamese) is like learning a nearby but non-Romance language that was heavily influenced by Latin or Romance languages, like English. Knowing Spanish will let you appreciate the similarities with English, but you'd still need to delve into historical Spanish (or French) to really see the connections.

Chinese is a pretty old and continuous language, so at the advanced level, learners could delve into Classical Chinese, which is truly analogous to Latin, and was also a dominant lingua franca in written form across East Asia. Pieces of Classical Chinese are 'alive' in everyday use, kind of like how Latin is 'alive' in the Romance languages through general and obvious similarities, or specific vocabulary, (e.g. "quid pro quo", "ad hominem"). But Classical Chinese is radically different from any dialect of today's vernacular Chinese.

In any case, the similarities and overlap definitely make learning between CJK(V) easier!


100% agree with this post, which I tried to get at saying that Korean and Japanese are distinct. Apologies for not making that clear. Your post is great to clarify this all more specifically.

The cultural side of things definitely requires a lot of time. And that’s fun.

What also takes time is learned how ideas are shared. Not just grammar, but which version of the multiple correct grammar options that is more common. How are paragraphs constructed? What is usually shared first in an idea or argument versus the end?

You can have someone with completely correct grammar and pronunciation that sounds odd to a native speak because the cultural / idea organization is not “native”.


Go back to middle Chinese, of which, for example, the Shanghai based Wu dialect, or the Southern Teochew dialect, retain much of the features of, and you will really feel like you're studying the mother language.


> Some researchers have been known to use Korean pronunciations of words to help triangulate to what ancient Chinese characters would sound like

That is rather an exaggeration. Korean and Japanese can be used as guides for Middle Chinese pronunciation, but they aren’t all that useful compared to using China’s own rhyme tables and running the comparative method on the Chinese dialects. For stages of Chinese before Middle Chinese, neither Korean nor Japanese are useful at all.


>Japanese students learned Korean even quicker than the Chinese students - in addition to using Chinese words, Japanese and Korean have the same grammar structure ( Subject Object Verb ) whereas Chinese and English are different (Subject Verb Object)

I've never found this a convincing reason as to why a language is difficult to learn. Word order is learned once while vocabulary, speaking and listening need to be learned and refined consistently over many years. All of these become harder the further the target language is from the ones you already know. Would French really be significantly harder to learn if they said 'Je le ballon frappe'?.

I also think this applies to some other 'hard' qualities of languages, like pronouns and alphabets (i.e. not logograms). Conversely, freedom in how you approach grammar in a sentence makes it harder, not easier, because while you're able to structure a sentence how you like it makes it harder to know what sounds idiomatic and it increases the number of forms you need to listen and read for.

Tangential thought: one of the reasons that vocabulary is annoying can be presented in CS terms. Imagine reading a sentence like 'Do you want to foo the bar?'. From the context you can probably understand what kind of words foo and bar are but you don't know what they mean. Yet they're the most important words in the sentence. This isn't a coincidence: the least common words in a sentence tend to be those with the highest entropy. If there was another word that carried the same meaning but was more common, the author would be more likely to use it. So it's not enought to learn 90% of the words, you need the niche words too.


I think you're hinting at the fact that all language acquisition ultimately boils down to vocab acquisition for the long tail to mastery.

As for why people complain about vocab, it's probably because the dropout rate for language learning is huge (look at the completion rates on duolingo or popularity of beginner resources vs intermmediate vs advanced). By the numbers I suspect most learners are struggling with grammar, and most will drop out before "finishing" with grammar and being left with the long tail of vocab.


This has been my experience that in the long run everything else is a distant second to vocab acquisition. I think even within vocab the long tail you have to worry about is overwhelmingly nouns. Verbs are also numerous but can often be inferred from context, replaced by simpler verbs etc. 95% of the time when I’m watching tv news and can’t follow its because they use some finance, politics, medical terms etc that cannot be guessed. There are simply far more ideas/things in the world than anything else. Unfortunately the truth that language learning is a long slow grind to learn nouns doesn’t fit well with those who want to make it a more intellectual exercise or other pedagogical goals of academic settings.


A different word order, any different word order, changes language learning from "learn a bunch of substitutions for my native language" to "learn a completely new system". There's a huge mental overhead for the latter.


The effort of learning a different grammar is dwarfed by the effort of memorizing a sufficiently large dictionary to understand everyday speech. To read at an elementary school level you need 5-20k words.


I thought that too, but as a bilingual person who studied Latin for several years... either I'm linguistically handicapped, or the grammar was legitimately tough to wrap your head around. Or both. Some of it was that I really did struggle with the grammar. Some of it was that sometimes we really were reading syntactically complex writing.


That's not true. People usually learn around a thousand words per year. Someone in their 20s is expected to have a 20k vocabulary but much of that is academic or professional vocabulary. i.e. the overlap between a mechanical and a software engineer might only be 15k words.


Is that number for active or passive vocabulary? For active, it seems reasonable, but GP is talking about reading comprehension, i.e. passive vocabulary.


If you're a master of the grammar, I think it's possible to start figuring things out from context once you've learned a critical mass of words


Yes, but "critical mass" is much larger than you'd commonly think. It's the low frequency words that contain most information in a sentence.


> It's the low frequency words that contain most information in a sentence.

This is an incredibly insightful point.


Thank you. That's a very succinct way of putting it. I wonder if there's a term or concept for this in linguistics.


There's a lot more to learning a different language than learning substitutions.

For starters, there is hardly any bijective mapping between words. Translating 'run' to German could mean any of 'rennen, renne, rennst, rennt, renn, laufen, laeuft' and I am sure I am missing quite a bit.

And that only scratches the surface, let's not get started about Grammar or idioms, etc.

The word order really only adds negligible overhead.


You get closer to something bijective when you only consider lemmas, but then you'll probably have around 1/3 that are bijective, 1/3 that 1->2 or 3 and 1/3 that are 2 or 3->1.


More like building a completely different persona for each family of languages. “Learn a bunch of substitutions” approach only goes so far, your language specific part of identity must be separated at where thought processes for languages diverge and necessary new parts must be built.

Philosophers deny linguistic determinism(a notion that language sets the ceiling for thoughts) but that’s just talking Turing Completeness.


So learn it. You'll get to practice it with every sentence you ever use.

Conversely, you'll only use 'ballon' when you take about un ballon. I hope that's frequently enough for you to remember the word & its gender. At least it looks like the word 'ball', something you don't get with Vietnamese's 'bóng' or Chinese's '球[1]'

And you'll still get funny looks if you say you're getting ready to go to the New Year's ballon, because substitutions fail with homonyms in both directions. And you'll have to learn and relearn about ten thousand of these substitutions.

[1]maybe?


> Tangential thought: one of the reasons that vocabulary is annoying can be presented in CS terms. Imagine reading a sentence like 'Do you want to foo the bar?'. From the context you can probably understand what kind of words foo and bar are but you don't know what they mean. Yet they're the most important words in the sentence. This isn't a coincidence: the least common words in a sentence tend to be those with the highest entropy. If there was another word that carried the same meaning but was more common, the author would be more likely to use it. So it's not enought to learn 90% of the words, you need the niche words too.

If you don't understand what a word means, can't you ask the other person?


Not in every situation & not if you're not good enough to understand the explanation.


That's an impossible bar to clear, considering that it happens to native speakers of any language too.


Just learning the words sounds great until you find out that Korean is agglutinative, and Koreans don't even agree on where the word boundaries are.


Subject object verb is the least of your problems if you're learning Japanese. Moreover, English supports that order: it's just not the canonical one. So it is not that alien. "Yoda speak" appears in sentences like "In god we trust".

Japanese also deviates from its canonical order at times, by the way. For instance, sometimes the topic is extraposed to the end of the sentence, even after a supposedly sentence-ending particle like よ. E.g. "ばかだよ私は". Sometimes that can be because the speaker realizes mid-sentence that it requires a topic to be clear, and you can tell when this is the case when the topic is preceded by a pause. But it has a nuance of its own, so it is used deliberately, even in writing.


I think your example highlights one of the amusing things about Japanese. That is the complete omission of the subject. As you said, the 「私は」 might only get thrown on as an afterthought. You can have entire conversations without ever actually voicing the subject. It can come off like a British comedy sketch for somebody to come into the middle of an exchange and try to figure out the topic from the last half of the conversation.

Kind of like using genderless pronouns in English takes away some hint to help you catch on. Only there's no pronoun at all. For English speakers it can get even worse when all that is spoken is a succession of passive verbs. Hilarious.

Edit: I should say that I never noticed a native Japanese person to be confused about the unspoken topic. They seem to be able to intuit it just fine in any context. I'm mostly just laughing at my own inability there.


> Edit: I should say that I never noticed a native Japanese person to be confused about the unspoken topic. They seem to be able to intuit it just fine in any context. I'm mostly just laughing at my own inability there.

Sometimes they will get confused and clarify with a question: 私って? 誰のこと? 私とは、どちら様のことでしょうか?


Just as a note, your sample Japanese seems odd to me. I'm way out of the loop on anything modern, but it is weird to see written Japanese with non-Japanese punctuation throughout.

Interestingly, it has the effect on me of giving it vocal inflection which works quite well in the context of the example.


This is a great comment, it's something I have never really thought about, but now that you mention it...

、is "correct" Japanese and is used in formal or serious communications.

? isn't technically correct in Japanese but it's become pervasive in informal to semi-serious communications in the last 15-20 years, but particularly the last 10 years.

It's comparable to the "passive aggressive full stop" in English, and the shift has happened over roughly the same time period too. These days, seeing someone write か。 instead of か? makes me think they are either old or upset*

*in anything other than quite formal keigo mails (JMO, I am mid 30s)

> Interestingly, it has the effect on me of giving it vocal inflection which works quite well in the context of the example.

It's interesting that you write that. I would say that is why people write that way:

終わったのかもしれませんよ?

終わったのかもしれませんよ。

In [0] there is a chart that shows that Heisei-born are almost as likely to exclusively use ? to indicate a high pitch at the end of a sentence as 40s+ are likely to exclusively use 。

[0]https://www.nhk.or.jp/bunken/research/kotoba/20170401_4.html


"Yoda speak" is actually object-subject-verb, not subject-object-verb. (And the sentence "In god we trust" is also OSV order). Subject-object-verb is not really present in English even in archaisms or other weirder contexts.


It's not absolutely unknown: "With this ring, I thee wed" from the Book of Common Prayer marriage service seems to me to be SOV (with an adverbial bit in advance), though it's definitely archaic. I imagine it features a bit in poetry too.


The word order in that sentence is less important, because it uses old pronoun "thee" as opposed to "thou" (analogous to German "dich" vs "du"), thus indicating the subject and object.

"With this ring, thee I wed"

"I wed thee, with this ring"

...

"Thee, with this ring, I wed" (???)

When you use "you", suddenly there is only one obvious order.


You, with your attitude, I cannot accept!


>Learning Chinese is like learning Latin before learning a Romance language.

>The added benefit is that Chinese isn’t a dead language.

I wouldn't go that far. It's more like learning Greek and hearing many of the same root words, sounds, etymology, etc. in English.

An Italian speaker can listen to Latin and grok maybe 25%. A Japanese speaker can listen to Chinese and get maybe 2%, max.

Words like "library" use the same Chinese letters, and have very similar pronunciations, but that's about the extent.

I'd say the best way to learn would be Japanese, Korean, then Chinese (what I did).

Source: I speak Japanese, Korean, and Chinese.


You can definitely get a deeper appreciation of a language and get extra decoding tools for learning related languages when you learn about the history and etymology of words.

As a romance language speaker, knowing about greek and latin particles helped me greatly in learning english of all languages, just because of the sheer amount of appropriations english does.

Something that sticks with me from my early japanese learning after all these years is that, for example, "あ" (japanese "a") is a stylization of "安" (pronounced "an"). This helps make an association similar to how some kids books overlay hiragana over pictures of things that start with that letter, e.g. overlaying "い" (i) around a strawberry (ichigo). There are tons of similar examples in both japanese and chinese (pronunciation prefixes being a common tool used by chinese learners)


> I'd say the best way to learn would be Japanese, Korean, then Chinese (what I did).

Why? I speak Chinese and used to be fluent in Japanese. I find my Chinese (and English + katakana) a huge help when visiting Japan in recent years.

What's your rationale for this ordering?


Because when you learn Japanese, you have to learn kanji, the more traditional strokes (non-simplified), both Japanese and Chinese sounds, and the grammar is closer to Korean (and closer pronunciation for a lot of words).

So, it's better to learn Japanese, quickly pick up Korean, then learn Chinese.

Chinese last since you have to learn simplified characters, the grammar is different, tonality is quite different, etc.


Doesn't the order matter only if it's harder to do this the other way around?

Anecdotally for me, learning Chinese first with simplified characters was totally fine, and without formal study, I can read traditional characters fine 90% of the time and guess a remaining ~8% of characters I don't immediately recognize for a total ~98% comprehension, kind of like figuring out how to read medieval gothic font.

Knowing Chinese, learning Japanese and its own simplified characters (shinjitai) was a breeze compared with my classmates who struggled especially with kanji, and also pronunciation. Until I compare hours with a native Japanese speaker learning Chinese, it's hard to tell if learning Japanese, then Chinese would have been easier or not.


> Anecdotally for me, learning Chinese first with simplified characters was totally fine, and without formal study, I can read traditional characters fine 90% of the time and guess a remaining ~8% of characters I don't immediately recognize for a total ~98% comprehension, kind of like figuring out how to read medieval gothic font.

AIUI that's not a recommended approach; my friend is learning Chinese and his teacher was insistent that he should learn the traditional characters first because it's much easier to pick up the simplified characters from the traditional ones than the other way around.


It's definitely true that it's easier for readers of traditional to learn simplified than visa-versa. After all, the simplified script was created by people who were already using traditional and were attempting to make logical and consistent simplifications!

On the other hand, your friend might find some very common traditional characters brutal to learn as a beginner. 醫生 (doctor) often comes up within the first few lessons and 鬱悶 (depressed) within the first couple of semesters. In contrast, someone starting with simplified won't be so daunted learning the simplified equivalents 医生 and 郁闷. Later, as an advanced student, the traditional forms aren't so daunting as they would have been to start with.

It's still more work for someone who knows 医 to learn the rest of 醫 than it is for someone who knows 醫 to learn that they just need to write the first 8 strokes that make up its upper left corner to make the simplified version, of course.


It's an interesting point--I underestimated the difficulty of learning to handwrite characters. Knowing simplified I can read traditional pretty effortlessly and even communicate with a relative in HK typing in traditional, but cannot write it by hand from memory. I liken it to being able to read but not draw the Gothic "T" in the New York Times.

Still, I'd claim that regardless of the direction, learning the simplification map (and some patternless exceptions) is a small incremental task compared with the thousands of hours learning the broader language. What's a few hundred mapping rules + exceptions, after learning 3000+ characters? If someone asked me which character set to start with, barring real issues with availability of education material, I'd say to learn whatever you want if you're committed to it! Just be ready to spend a few hundred more hours at least due to the increased amount of information to memorize in traditional.


> kanji, the traditional strokes (non-simplified)

Uh, both Japan and Chinese use different simplified glyphs. And they overlap a lot because they are different standardizations of handwritten glyphs in the bigger Sinosphere. I would be surprised if you indeed learned the traditional characters while learning Japanese.


>Uh, both Japan and Chinese use different simplified glyphs.

Wrong. Though there are divergences, Chinese in mainland China use simplified Chinese characters (https://en.m.wikipedia.org/wiki/Simplified_Chinese_character...).

Kanji is more closely related to the traditional Chinese characters (https://en.wikipedia.org/wiki/Kanji#Local_developments_and_d...) than the mainland simplification.

>I would be surprised if you indeed learned the traditional characters while learning Japanese.

Kanji is closer related, or exactly the same as traditional Chinese characters (used in HK, Taiwan, etc.) than simplified.


My understanding of Chinese (the language) is limited, but having learned Hanja (essentially traditional one) first then learned Japanese later the divergence was already significant. Or, more quantitatively, the Joyo Kanji has 364 simplified characters out of 2,136 (~17%) [1] while the mainland China simplified probably about 2,000 out of 7,000 common characters (~28%) [2]. This divergence is not the biggest deal, but still big enough to refute the claim that Japanese is easier to learn because it's "non-simplified" (there may be other valid reasons though).

[1] https://en.wikipedia.org/wiki/Shinjitai#Simplifications_in_J...

[2] https://en.wikipedia.org/wiki/Simplified_Chinese_characters#...


>Or, more quantitatively, the Joyo Kanji has 364 simplified characters out of 2,136 (~17%) [1] while the mainland China simplified probably about 2,000 out of 7,000 common characters (~28%) [2].

It's 17% and 28%, but much, much higher for the most commonly used characters, which is where the divergence between simplified and traditional grows even more.

>This divergence is not the biggest deal, but still big enough to refute the claim that Japanese is easier to learn because it's "non-simplified" (there may be other valid reasons though).

I didn't make that claim. Maybe your talking about another thread?

It's not that it's easier to learn, it's that the path of going Japanese -> Korean -> Chinese is the easier, most logical path. The diverging simplified Chinese standard characters being a large reason behind that.


Speaking sure, because both Korean and Japanese were fully formed languages that borrowed a writing system. I've come to view Kanji/Hanja as an information layering on top of the language used to increase the information density/clarity of the written languages.

If you look at written Japanese/Korean (with Hanja) and you get a much higher transfer. Esp. Someone who knows Chinese trying to get by in Japan.


That sounds similar to English and German. There are a few words that are still somewhat connected, like water / wasser, but knowing one gives little insight into the other.


There's also "Gift" (Poison), "Pickel" (Pimple), "Wo" (where), "Wer" (who), "Fahrt" (travel), and so on...


> Learning Chinese is like learning Latin before learning a Romance language. The added benefit is that Chinese isn’t a dead language.

As a native Korean speaker this is an apt analogy, even more so because it won't help much initially but will help a lot once you've got to the later stage of learning. This is of course a valid strategy if you want to eventually learn both Chinese and Korean.


Yes - you should learned the language that has meaning to your life. If you don’t have a reason to use the language you will forget it all.

And agreed - once you get into more “college level or technical” Korean for lack of a better word, knowing the hanja/ Chinese characters makes vocab acquisition easier.


> The Chinese pronunciations have changed. Some researchers have been known to use Korean pronunciations of words to help triangulate to what ancient Chinese characters would sound like (in combination with poems where you know what rhymes to expect etc)!

Modern Korean isn’t a tonal language tho, the length of the vowel changes the meaning. So how does that work?


Chinese itself wasn’t much of a tonal language until a little more than one thousand years ago. Modern Chinese tones emerged in the Middle Chinese era as a way to avoid homophony/preserve distinctions of meaning as a lot of old final consonants were lost.


Modern Korean alone definitely isn't sufficient to reconstruct from, but including it in an analysis can add some information that isn't preserved in other branches.


Korean does not have vowel length change the meaning, however Japanese does.

Korean used to be tonal, and you are right it now longer is.

Korean is used in combination with other data points to determine pronunciations.

This works for old English plays, for example too. You can use poems that should rhyme, plus commentary from linguists at the time, to guess what certain words would sound like “back then”.

Just saying that Korean is another super rare but fun random fact that I have heard of being used at least once in combination with ancient Chinese poems, to guess at the old Chinese pronunciation.


https://en.m.wikipedia.org/wiki/Korean_phonology

> In 2012, vowel length is reported almost completely neutralized in Korean, except for a very few older speakers of Seoul dialect,[14] for whom the distinctive vowel-length distinction is maintained only in the first syllable of a word.[13]

So it does have vowel length change meaning but it’s almost kinda died out like tones?


Native Korean speaker here, born in the 70s and grew up in Seoul. We all learned a dozen short-long pairs at school (like 눈 nun being either eye or snow), and of course the education got all of us tricked into thinking that we're making these distinctions (if only we try hard enough). But looking at it objectively, I don't think a single person around me used vowel length distinction. If you have to "think" whether a word is long or short, that means the distinction is dead. No English speaker has to "think" whether fan or pan is the correct word. An actual sound distinction comes automatically for native speakers.

Similarly, ㅐ/ㅔ ("ae"/"e") distinction was dead before I was born, but schools insisted that they're well and alive. Not sure what they're teaching these days.


I can't speak Korean, only a little bit of Mandarin :D so I'm really happy to have your reply, thank you! Makes sense!


Vowel lengthening matters in Korean technically but is more often context-sensitive than length sensitive like it regularly is in Japanese for reasons I don’t know. 눈 can mean snow or eye in Korean depending upon how long you extend the vowel as recently as 1970, for example, but many Koreans (especially younger Koreans) have less sensitivity and don’t use it. When I learned Japanese there were more words where it was easier to confuse them and thus require a distinction, probably preserving length.

Overall Korean historically has had a tendency toward reducing tones and sounds (it used to have f and z sounds like in Chinese, for example). It continues to eliminate vowels where a lot of Koreans are not pronouncing the sound 최 fully and eliminating lip rounding to make it simply 체.


> When I learned Japanese there were more words where it was easier to confuse them and thus require a distinction, probably preserving length.

I did this to unintentional comedic effect the other day by referring to 「星の王子さま」as「星のおじいさま」.


My spouse is Chinese and I tried to learn Chinese for obvious reasons. However, they speak multiple dialects in practice such that it's hard to focus, and there are limited training materials for some of the dialects. Generally there is Cantonese, Mandarin, and their local dialect, which is what they use most often.


Your spouse must be fairly old school to be using a dialect that isn't mando/canto...

Most millennials I know from mainland china and even Malaysia/Singapore will speak putonghua with their parents/family now. Ofc people from Sichuan and Shanghai love to preserve their dialect. Plus you got the cantos and hokkien/teochew/hakka and even changsharen.

In any case, my recommendation would be to stay with Mandarin and go deep on it. Learning a dialect afterwards is quite straightforward and you can be 80-90% proficient esp with understanding by just developing a sound mapping of each character from dialect->mandarin.

The common idioms you'll pick up with enough exposure.

You'll never regret learning Mandarin first - you'll be able to read/listen to the news and your relatives will understand you. Whereas if you try to learn dialect first you'll find it harder due to the less set of materials AND less people to talk to.

A lot of chinese people joke that they learned their 2nd/3rd... dialect by doing KTV and there's definitely some truth in that


I came across this quote from the Mandarin wiki:

> The Chinese have different languages in different provinces, to such an extent that they cannot understand each other.... [They] also have another language which is like a universal and common language; this is the official language of the mandarins and of the court; it is among them like Latin among ourselves.... Two of our fathers [Michele Ruggieri and Matteo Ricci] have been learning this mandarin language... — Alessandro Valignano, Historia del principio y progresso de la Compañía de Jesús en las Indias Orientales, I:28 (1542–1564)


I always find the claim that dialects in China vary so much that people cannot understand each other very western-centric. As a native speaker, there are shockingly few dialects that are completely incomprehensible. Someone from Shanghai will be able to speak to someone in Chengdu with their own dialect, just slower.


Take Cantonese (or heck, Teochew or Hakka). You really think this language is reasonably mutually intelligible with Mandarin? Or Shanghainese?

I'm a (non-native, formerly fluent but now pretty poor) Cantonese speaker and I'd describe the difference between Cantonese and Mandarin as roughly the difference between Portuguese and Romanian. Sure, they're both romance languages, and maybe with a lot of work and hand-waving and drawing characters on your hands with your fingers you can get your point across to someone who speaks the other language, but in no way would I call them mutually intelligible, even by talking "slower".

Cantonese and Mandarin have very different pronunciation, vocabulary, and even grammar for common cases. In fact there are extremely common words in Cantonese for which there is no modern chinese character: instead roman words or even single letters (like "D") are substituted in comic strips etc. So you can't even write them down in Chinese for a Mandarin speaker to puzzle through with a dictionary.

I'd say that your perception of these things as "western centric" -- they are not at all -- strikes me as very "northern Chinese centric" view of China. :-)


It's not necessarily a northern Chinese centric view of China as it is a Mandarin (as in the language family) view of China. The vast majority of Chinese dialects in a large band from Heilongjiang to Sichuan (and a bit out to Qinghai and Xinjiang) are more or less mutually intelligible. That mutual intelligibility drops off very quickly outside of that band. In particular a lot of Southeast dialects (Yue, Wu, Min, etc.) fall outside of that zone. But it covers the majority of Chinese speakers. (However, I disagree with the original claim that a Sichuanese speaker could understand Shanghainese).

See e.g. this video: https://www.youtube.com/watch?v=Ps7_NnkL-oM where for almost the entire video a couple is explaining words in their respective dialects (Sichuanese and Hunanese) to one another in their own dialect (i.e. not using Standard Mandarin at all in their explanations). The first half of the entire exchange is fluent and understandable by both people and also a Standard Mandarin audience. Only in the second half when they intentionally start quizzing each other on words they know will be tough for the other person and without additional context do they run into trouble (and occasionally resort to Mandarin).

Also, comparing it to Romance languages masks some complexities of the relationship. https://news.ycombinator.com/item?id=16844074


Heh, this is the perfect place for me to jump in with some movie trivia.

Have you ever watched either Ip Man or "The Bridge" (Danish/Swedish cop show)?

In Ip Man, there's a gang of Mandarin speakers who show up in Ip Man's Cantonese world. They somehow are able to speak to each other, and I find it unlikely. Perhaps it's because my Cantonese is mediocre. But it just doesn't sound like you could have a casual conversation across the languages.

A similar thing happens in The Bridge. There's two cops solving a crime on the bridge connecting Denmark with Sweden, and they just talk to each other in their own languages. My guess is most people who didn't study the other language would not know all the slang, as importantly the simple expressions. Though I guess it could be learned easily; I can read a Swedish newspaper, just not process it fast enough to have a fast conversation with a colleague.

Similarly with German and Swiss German. Something about the way the sounds are different makes processing the other one a bit slow if you're not used to it.


> In Ip Man, there's a gang of Mandarin speakers who show up in Ip Man's Cantonese world. They somehow are able to speak to each other, and I find it unlikely. Perhaps it's because my Cantonese is mediocre. But it just doesn't sound like you could have a casual conversation across the languages.

So the mixed Mandarin / Cantonese conversations actually do happen. My wife's family are all native Cantonese speakers, but her uncle married into the family from another part of China. He understands Cantonese perfectly but is more comfortable speaking Mandarin; and everyone else understands Mandarin to various degrees but prefer to speak in Cantonese. So he just speaks Mandarin while everyone else sticks to Cantonese; and it's quite a fluid back-and-forth.

That said, for that to happen so naturally the way it does in the Ip Man movies is really unrealistic. First of all, Mandarin wasn't as well-known in Guangdong in the 1930's as it is now; and there's no way someone from another part of China could just show up and magically understand Cantonese: It would take at least a few months of immersion to be able to "listen" fluidly enough to have that sort of a mixed conversation.

Even more unrealistic is Ip Man 4, where he shows up in San Fransisco, and starts talking in Cantonese to people who have immigrated from Beijing -- including one of the character's 12-year-old daughter who grew up in San Fransisco. Yeah, that's not going to happen.

[Minor grammar edits.]


That's kinda interesting that he can do that. With a bit of recent Mandarin schooling I can see the langauges are quite similar, but as a kid whenever we had a Mandarin speaker visiting I was baffled. Same thing happened when I learned German, as soon as I learned the initial things like the pronouns everything clicked.

It's probably the case that all the often used words that you use in everyday life have changed, but once you get over the barrier there's a pretty smooth learning zone.


Just to be clear, her uncle has now lived in HK for several decades; I'm sure there was a several-month learning curve being able to understand Cantonese. It's not like Hindi and Urdu, where (I'm told) people with no previous exposure to the other language can communicate pretty readily.

But there are a huge number of words which are simply pronounced a bit differently; and the differences in pronunciation have a lot of predictable patterns. So once you're familiar with these patterns, suddenly a Mandarin speaker can understand a huge amount of Cantonese when spoken in context.


> Someone from Shanghai will be able to speak to someone in Chengdu with their own dialect, just slower.

This is definitely not true (if they were to use their own dialect).

I'm from the same region surrounding Shanghai and although there are some intelligibility between dialects, it is hard if not completely impossible to have a conversation even between people from adjacent towns. See https://en.wikipedia.org/wiki/Wu_Chinese

I'm a bit surprised about your claim. Anyone from China with a sense of the dialects of the south would know this.


Mandarin speakers ability to understand slowed down Shanghainese is likely a recent development. This is because much of traditional Shanghainese diction, grammar, and even pronunciation has been reduced or morphed to better suit the newer generations of speakers whose principal language is Mandarin.


written cantonese is almost the same as written mandarin.

the problem is that spoken cantonese is not much like written cantonese. it's a combination of pronunciation, accent, idioms, and sentence structures. best advice is for speakers of mandarin or cantonese to learn the other language like a new one. (there's also the written form of spoken cantonese, written spoken cantonese, if you will, that is the written representation of spoken cantonese using a set of special characters of sinitic origin).

if you are from guangdong you will learn all systems of language - written mandarin, spoken mandarin, written cantonese, and spoken cantonese. some hongkongers don't ever learn spoken mandarin correctly, getting them to speak mandarin in front of a camera is always a source of good laughs.

those who argue that cantonese are mutually intelligible with mandarin will typically use cantonese newspapers and official documents as evidence. those who argue that they aren't will point to spoken and colloquial cantonese as evidence. the objective point of view is to accept that cantonese as we know it exists as two separate languages used for two different purposes.


The claim may be misguided or wrong perhaps, but how is it Western-centric?


Here are ten Shanghainese sentences, four short ones and six long ones:

1. https://audio.tatoeba.org/sentences/wuu/488406.mp3

2. https://audio.tatoeba.org/sentences/wuu/485590.mp3

3. https://audio.tatoeba.org/sentences/wuu/485698.mp3

4. https://audio.tatoeba.org/sentences/wuu/488621.mp3

5. https://audio.tatoeba.org/sentences/wuu/489046.mp3

6. https://audio.tatoeba.org/sentences/wuu/489736.mp3

7. https://audio.tatoeba.org/sentences/wuu/492871.mp3

8. https://audio.tatoeba.org/sentences/wuu/496402.mp3

9. https://audio.tatoeba.org/sentences/wuu/485676.mp3

10. https://audio.tatoeba.org/sentences/wuu/488252.mp3

They're not especially slow, but you can replay them as often as you want or even slow them down, so I think it should be easier than for someone in Chengdu trying to understand a Shanghainese speaker in person.

For comparison, here are the Mandarin equivalents of those sentences, pronounced by the same speaker:

1. https://audio.tatoeba.org/sentences/cmn/332870.mp3

2. https://audio.tatoeba.org/sentences/cmn/332436.mp3

3. https://audio.tatoeba.org/sentences/cmn/332568.mp3

4. https://audio.tatoeba.org/sentences/cmn/333012.mp3

5. https://audio.tatoeba.org/sentences/cmn/333070.mp3

6. https://audio.tatoeba.org/sentences/cmn/333151.mp3

7. https://audio.tatoeba.org/sentences/cmn/333392.mp3

8. https://audio.tatoeba.org/sentences/cmn/333584.mp3

9. https://audio.tatoeba.org/sentences/cmn/332549.mp3

10. https://audio.tatoeba.org/sentences/cmn/332788.mp3

And here they're written down, together with translations into various other languages:

1. https://tatoeba.org/cmn/sentences/show/488406

2. https://tatoeba.org/cmn/sentences/show/485590

3. https://tatoeba.org/cmn/sentences/show/485698

4. https://tatoeba.org/cmn/sentences/show/488621

5. https://tatoeba.org/cmn/sentences/show/489046

6. https://tatoeba.org/cmn/sentences/show/489736

7. https://tatoeba.org/cmn/sentences/show/492871

8. https://tatoeba.org/cmn/sentences/show/496402

9. https://tatoeba.org/cmn/sentences/show/485676

10. https://tatoeba.org/cmn/sentences/show/488252

If you want some more examples, this search returns a few hundred: https://tatoeba.org/cmn/sentences/search?query=&from=wuu&to=...


To be fair these translations slightly exaggerate the difference between Shanghainese and Mandarin. The speaker is speaking in a more formal register in her Mandarin samples than in her Shanghainese samples which causes greater vocabulary and structural differences than if they were at the same register. In fact I'm a little surprised she doesn't use exact analogs when translating between the two, e.g. 搿就是讲 -> 这就是说 instead of her choice of 这意味着, 越来越结棍了 -> 越来越历害了 instead of her choice of 愈演愈烈, and 开始寻被偷脱个物事了 -> 开始找被偷掉的东西了 instead of her choice of 开始找被偷物品了, since it is both a more accurate translation and reflects the structural similarity of the two sentences much better (there is a one-to-one correspondence between each word of both sentences).

Regardless I certainly would not view Shanghainese as mutually intelligible with Mandarin.


> In fact I'm a little surprised she doesn't use exact analogs when translating between the two

If you check the logs on the sentence page, you'll see that the Shanghainese sentences are actually translated from the Mandarin. So she was going from relatively formal Mandarin to more informal Shanghainese, probably because that's what she's accustomed to speaking.

I didn't really intend these as a demonstration of how different Shanghainese and Mandarin are even when you translate as literally as possible, but more as a way for people on HN who know Mandarin but haven't had much exposure to Shanghainese to try and see how much they understand. I like to think I understood a few words like 自己 and 警察, but maybe that's because I looked at the text first.


This is a super high quality post - thank you!

I think you've also answered my questions in this space before, what kind of work do you do? You seem very familiar with asian languages deeper than just a native speaker/hobbyist might be.

I thought recording 6 was vaguely mandarin sounding, if you squinted you could more or less guess what was being said. Agreed that the rest are hard to understand but I don't think it takes native mandarin speakers much effort to get to a point where they can get proficient in another dialect. IMO learning Shanghainese from Mandarin will be easier than learning Portuguese if you speak Spanish


it's true but also largely moot. mandarin is enough, nearly everyone speaks it.


Same problem here (different dialect though). Let me know if you found a solution.


The English and Chinese Wikipedia articles about the dialect will usually have at least some useful information e.g. about the phonology and maybe links to other resources.

You can also try Wiktionary: https://en.wiktionary.org/wiki/Wiktionary:About_Chinese#Abou...

https://xiaoxue.iis.sinica.edu.tw/ccr/ has IPA pronunciations for single characters across many different dialects.

https://forvo.com/languages-codes/ may have some recorded vocabulary.

https://github.com/laubonghaudoi/Chinese_Rime has input methods for many dialects that can also be abused as a dictionary (associating romanization with Chinese characters you can look up).

https://xefjord.wixsite.com/xefscompletelangs/courses#comp-k... may have flashcards if you're lucky.

You'll probably also want to make your own flashcards. https://github.com/ppwwyyxx/wechat-dump is helpful for getting voice messages out of WeChat.

If you want to practice by passive listening, you can try http://phonemica.net/ , searching on https://youku.com/ or check whether there's a local TV station that has programs in dialect.

Reading material will be almost impossible to find, but maybe there's a bible translation from the 19th century or something.

Also check whether there's a local language preservation group.

Finally, https://zhongguoyuyan.cn/ is supposed make a lot of material available to the public soonish (right now I get a certificate error in Firefox, but you can see the landing page in a less strict browser).


If you're learning Chinese from her, you might be interested in my project for couples learning each other's languages! https://learncoupling.com


My university Japanese class was full of Korean and Chinese international students. It was hard to keep up.


If you know Turkish (or any Central Asian language barring Tajik[not familiar with Persian adjacent languages]), learning Japanese is a breeze because you can translate a Turkish text into Japanese word for word and out comes a perfectly constructed Japanese text. Obviously, it works in the other direction as well.

I remember trying to learn Japanese through Russian and having hell of a time untill I came across a Japanese textbook written in Turkish.

If you are from Central Asia, you can leverage both of your languages (native Central Asian & Russian) for learning English. It's better to learn English through Russian untill it's time to learn the English tense system at which point you can swith back to your native Central Asian as Russian is not very amenable for learning the English tense system whereas the tenses(rather aspects?) of C.A. languages line up with that of English in almost one to one manner. Thinking about the English tenses through the "Russian mind" was a nightmare untill I realized I already know the tenses through my native Kazakh.

Mathematicians never shy away from throwing everything at their disposal at a problem. Acting like them speeds up language acquisition process considerably.


How close are Turkish and Japanese? I know there is a proposed "Altaic language family"[1], but I'd love to hear the perspective of someone who actually knows two such languages.

[1] https://en.wikipedia.org/wiki/Altaic_languages


Vowel harmony, no genders, no plurals, agglutinative etc are similar.

The only thing you need to learn is new words. The grammatical structure is same. Translation between Indo-European and Altaic is very hard. Translation beween Altaic languages is very easy.


Not mentioned in the article, Hangul (the Korean alphabet) was created in 1443 by the King. It was meant to aid literacy because of the preexisting incompatibility of Chinese characters and the Korean spoken language. It's actually surprisingly easy to learn - you could do it in a really short time. But, because the symbols (24 of them) represent sound only, you won't usually know what you're saying.

Also, it was designed with a certain concept in mind: "The letters for the five basic consonants reflect the shape of the speech organs used to pronounce them, and they are systematically modified to indicate phonetic features"

https://en.m.wikipedia.org/wiki/Hangul


To be exact, Hunminjeongeum (훈민정음) is what the King Sejong actually created. It is, in addition to the stated goal, thought to be the ultimate linguistic geekery by the King because one of the first books printed in Hunminjeongeum was Dongguk Jeongun [1], which set out to standardize the "orthodox" Chinese pronunciation in Korea. As always that didn't go well; the book does tell a lot about the Middle Korean despite of its failure.

[1] https://en.wikipedia.org/wiki/Dongguk_Jeongun


Thank you for this! The wiki page I linked to definitely doesn't cover this aspect well (specifically, Hunminjeongeum) and I learned something new.

Of relevance:

https://en.wikipedia.org/wiki/Hunminjeongeum https://en.wikipedia.org/wiki/Hunminjeongeum_Haerye


i learned hangul characters by reading street signs while travelling through south korea. most signs were both in hangul and latin characters.

then i found a keyboard typing practice program where i patched the letter images and replaced them with the appropriate hangul character parts, to practice typing.

this skill came in handy when i needed to write down some korean words when hearing them.


Despite the title, there's not much commentary on how Chinese sounds to Korea and Japan. I was hoping for something like Prisencolinensinainciusol's supposed American-English-to-Italians:

https://www.atlasobscura.com/articles/deep-roots-italian-son...

https://www.youtube.com/watch?v=-VsmF9m_Nt8


That’s amazing, I’m not a native English speaker and that sounds exactly like when you are listening to a song in English but you are not understanding the lyric.


Another one that's quite good

https://youtu.be/Vt4Dfa4fOEY


Interesting observations.

As native Chinese speaker:

The Look:

1. Japanese: I like the kanji, it is very recognizable, even the character set is different. Most Chinese will have no problem reading traditional Chinese characters, though somewhat slower. The rest...not so much, kanas are my biggest headache.

2. Korean: First thing I noticed is the presence of circle, which is not part of Chinese radicals until very recently. Because it looks like bubbles, so it looks somewhat ... cute? No meaning can be inferred beyond that. Also the use of spaces are noticeable.

The Sound:

1. Japanese: Fast. Less variation in the speech itself. Notice the presence of pitch. The kanji based words sound very different from what it would sound like in Chinese

2. Korean: Not as fast as Japanese, but still faster than Chinese. A lot of unfamiliar sounds that are absent in Chinese pronunciation. Sometimes I would be able to find one word or two that sounds like Chinese and makes sense in the context, but the rest is just foreign.

Question to Korean speakers:

Do you guys recognize each individual Hangul character's meaning (under the context of the word of course) if that word has a Chinese origin? Or the word is recognized as whole. For example 부동산(real estate), comes from the Japanese kanji word, 不動産, which in Chinese means 不(not)動(moving)産(assets). Does this inferential aspect of Chinese still apply in certain cases once it is written in Hangul?


> 1. The kanji based words sound very different from what it would sound like in Chinese

This is because you are probably speaking in mandarin which has deviated a great deal from Middle Chinese ever since the Jurchen Jin conquered northern China.

But if you were to compare it to a more conservative Chinese language like min-nan or some other southern chinese language, the similarities are unmistakable.

examples of pronunciation:

忍者: Ninja (JP), Nin-jia (Minnan), Renzhe(mandarin)

美人: Bi Jin (JP) , Bi Jin (minnan), meiren(mandarin)

簡単: Kantan(JP), Kan Tan (minnan), jiandan (mandarin)

時間: JiKan (JP), Si Kan (minnan), shijian (mandarin)

世界: sekai (JP), Sei Kai (minnan), shijie (mandarin)

速度: sokudo (JP), Sok Do (minnan), shudu (mandarin)

確認: kakunin (JP), Kak Nin or Kak Lin (minnan), queren (mandarin)

区别: ku betsu (JP), ku piat (minnan) , qu bie (mandarian)

人類: jin rui (JP), Jin Lui (minnan), ren lei (mandarin)

and korean: 金 : Kim (kr), Kim (minnan), Jin (mandarin)

新婦: Sim Pu (kr) , Sim Pu (minnan), Xin fu (mandarin)

學生: hag saeng (kr) , hak seng (minnan), xue sheng (mandarin)

參加:Cham Ga (kr) , Tsham Ka (minnan), Can Jia (mandarin)

Notice how minnan and korean preserves the ending consonants like "t" and "g" sounds while Japanese simulates the ending consonant with a new character. So the character 速 is pronounced Sok but in Japanese is split into So & Ku where ku simulates the ending consonant.

Mandarin just does away with ending consonants completely. Many other changes such as the lack of the "f" and "v" sound in early middle chinese which is preserved in korean and minnan but not in other Chinese languages where many "b" consonants are converted into "f" consonants. Or the lack of ending "m" consonant in mandarin which is still present in minnan and korean.


any source on the mapping for the sound changes? That's very interesting and I would like to learn more ...



> For example 부동산(real estate), comes from the Japanese kanji word, 不動産, which in Chinese means 不(not)動(moving)産(assets). Does this inferential aspect of Chinese still apply in certain cases once it is written in Hangul?

It depends. For this particular example, I think most Koreans will treat 부동산 as a single word meaning "real estate", because it's not a very productive combination: the alternative 동산 (moveable properties?) is a legal term which is much less common, and 산(産) as "property" isn't common either.

On the other hand, a word like 고밀도화(高密度化 - densification) is transparently decomposable to 고(高 high) + 밀도(密度 density) + 화(化 -ify). Few people can write it down in hanja (I just copy-pasted from dictionary), but most people will immediately recognize its meaning, even if they've never seen the word before.


That example is interesting. The Japanese word for real estate, 不動産, actually came from French [1], 'immobilier' (real estate, by opposition to what is movable or 'mobile' such as 'meubles' or 'mobilier', i.e. furniture). And of course the French word comes from Latin...

[1] https://ja.wiktionary.org/wiki/%E4%B8%8D%E5%8B%95%E7%94%A3

The Japanese would treat the whole combination as a single word (in my experience) and wouldn't analyse the components either (although 'they make sense'). The etymology is often more apparent in Japanese and Chinese thanks to the characters, whereas it's not always easy to see it in other language (unless you speak Latin or Greek..). But this is a borrowed word and in this particular case it's very intuitive to French speakers as immobilier / mobilier / mobile are all common words.


Thanks for the explanation. Appreciated.

I think this answers part of my question as how modern Korean create new words/concepts, if not via English loanwords. So those word roots could still be applied in certain cases.


One addendum to this: some of the simplified printed characters actually date back centuries in China (e.g. the 14th and 13th centuries CE), and IIRC were used for easier carving of the woodblocks used for printing. And Japan has been introducing simplified printed characters for many centuries too. And that's before you even get into variant scripts used in calligraphy, shorthand, and personal seals. Point being, it's even more complicated than this, historically speaking.


Japanese also has informal simplified characters used in handwriting, some of which aren't in Unicode despite being commonly used:

https://en.wikipedia.org/wiki/Ryakuji


Korean has a somewhat similar problem where the unicode displays ㅈ/ㅅ, but they're not written like that. It's a common beginner mistake. After all, who would expect characters to be written like they're displayed on a computer? :-) For the curious, ou can see how they're written here. [0]

[0]: https://blogs.transparent.com/korean/files/2017/08/Stroke-Or...


This is a stylistic/typeface difference between sans-serif and serif text. Some people do write like that, most people don't.


I have never seen a native korean write ㅈ/ㅅ as displayed by most fonts, at least in non-formal settings. It's always been the handwritten version as depicted.

ㅅ almost looks like 人 [0] (인) on computers (looks like 1 stroke to beginners), but when handwriting it is 2 strokes. ㅈ is also 2 strokes (personal handwriting aside) and looks rather different.

[0]: https://en.wikipedia.org/wiki/Radical_9


ㅈ is three strokes; not sure what you mean by personal handwriting aside, but the 'stroke' is a designation of hand movement, not a count of curves that makes up the character.

> I have never seen a native korean write ㅈ/ㅅ as displayed by most fonts, at least in non-formal settings.

Here's[0] someone's history of Korean typography which points out that the ㅅ may come from 훈민정음 Hunminjeongeum [1], which indeed has ㅅ .

Also perhaps you haven't seen middle school kids obsess over handwriting with multicolored pens ;)

[0] https://m.blog.naver.com/PostView.nhn?blogId=designmage&logN...

[1]https://en.wikipedia.org/wiki/Hunminjeongeum_Haerye#/media/F...


Hmm? I don't think anyone considers ㅈ as three strokes, unless they're writing ㅈ like printed font. When handwritten, the whole フ-like shape is one stroke, and the small attachment is the second stroke.

Traditionally, one stroke means one movement between the brush touching the paper and lifting off - for example the Chinese character 弓 has three strokes: https://en.wiktionary.org/wiki/%E5%BC%93#/media/File:%E5%BC%...


Looks like it’s both 2 and 3 획 when handwritten. I know people who write with 3 strokes; I do with 2.

https://ko.m.wikipedia.org/wiki/ㅈ https://namu.wiki/w/ㅈ


人 is 2 strokes tho


Sorry, I meant that it looks like 1 stroke to beginners. Visually ㅅ looks similar to 人 on computers but in handwriting it's different.


but some time after WWII (I don't remember the exact dates) Japan and Mainland China adopted a simplified form.

I lived in Taiwan when I was younger and studied Mandarin there, and can read a limited vocabulary of traditional Chinese characters. I don't know enough about the history of simplified characters in Japan, but it doesn't seem to me that they are as widely used (with a few exceptions like 国) as traditional forms. Maybe someone knows the history?

Also, regarding this statement:

臺 (Korea/Taiwan) 台 (Japan/China)

In Taiwan, the latter form of tai (台) is used 99% of the time in colloquial, news, and signage. The complex form (臺) AFAIK is used almost exclusively in certain government situations (e.g. the name of a central government bureau) or on money. Street signs for famous buildings all use 台, i.e. 台北101, 台北松山機場, etc.


Simplified forms have been common in handwriting for centuries. Korean hanja was also simplified in handwriting when it was in use, even though it never got standardized. The "adoption" thus mostly means the standardization of such established glyphs even in print. (This also partly explains why the second reform in the mainland China [1] wasn't received well and retracted at the end.)

[1] https://en.wikipedia.org/wiki/Second_round_of_simplified_Chi...


That's a pretty extreme exaggeration. 臺 is on many, many famous buildings: https://tnimage.s3.hicloud.net.tw/photos/2020/CNA/20200229/2...


The example you cited is operated by a central government entity, the Taiwan Railways Administration, and is used in a few places inside and outside the station, just as every other central government building uses the traditional form. On every other map and sign it's 台北車站. The tickets printed by the TRA even use 台北 or 台北車站.


>Maybe someone knows the history?

You'll need to look up shinjitai, which consists of the simplified characters adopted in Japan, which are different to the ones used in China.


That Japanese Man Yuta has a great video demonstrating just how much Chinese script do ordinary Japanese folk understand:

https://youtu.be/rzJqXd-1dEU


It's so amazing how those languages are completely different from western languages. You can see the people in this video inferring multiple, sometimes totally unrelated meanings from the same set of symbols. That is not possible at all in western languages. You either understand a sequence of characters or you don't. One can debate the contextual and abstract meaning of a sentence, but not it's absolute meaning.


While I agree with your admiration of the versatility and power of CJK languages, I don't think the particular aspect you're commenting on is specific to those. You see this in many western languages too. Here's an example in English. Take the word "pot". It can be a container, it can be a plant, it can be a drug, it can be a dish (pot roast), it can be an electronic component (short for potentiometer). Things can "go to pot". Someone might be told to "shit or get off the pot" when they are indecisive. The meaning depends on context. Some of these meanings are related, some are not. Yet it's all the same word. It has some meanings that are shared across different anglophone cultures, and some that are specific to one or several. There are many other words like that, and you find things like this in almost every language family.


For example, “Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo” is a valid English sentence.


Syntactically defensible, sure. But what anglophone not seeking to deliberately confuse an interlocutor would call bison from upstate New York "Buffalo buffalo"? What action could such bison perform on other bison that can reasonably be construed as "buffaloing"?

Semantics and pragmatics are part of the language, too.


Can you explain better?

There are a lot of false pairs in Latin based/Romance languages.

As an example if you can read the sequence b-u-r-r-o you get "burro", that means butter in Italian but ass/donkey in Spanish.


In the first segment of the video, the Chinese for "Ramen" is two characters pronounced "Ra" "Mian". They're being used for sound, rather than meaning (although mian is often used as shorthand for "miantiao" = noodle, so it's a nice double entendre).

Some of the Japanese speakers wound up interpreting the characters by meaning rather than sound, leading to hilarious translations.


拉麺 is "pulled noodles", which is meaning, not sound. 麺 on its own already means noodles, but in China it has become ambiguous because it was simplified down to 面, which also means "face". (And that's the sole meaning in Japanese for the character, hence the confusion in the video.)


That's the key difference between ideograms (based on pictures of combination of pictograms) and sound-based writings (alphabet and the like).

Ideograms have a major drawback, however. To represent most of things you need in daily life, you need thousands of them.


That one's more fun because it's a little more challenging for the Japanese folk, but the original is a bit more fair since it's vs traditional Chinese characters (and spoken Cantonese is a bit closer to the version that was imported by Japan): https://www.youtube.com/watch?v=-E6vHCT0wpw


The first question is funny because the pronunciation of the last 2 characters (拉面 - lamian) is why Japanese call Chinese noodles 'ramen'.


I found their bewilderment a little puzzling as well, but I believe it's due to the simplification of the last character. If you watch the older Canto/Traditional version of the video, the Japanese folk get the ramen part immediately.


Japanese uses an alternate simplification (麺) for the sense of noodles, and doesn't merge it with 面. Generally you wouldn't expect Japanese person to have exceptional difficulty recognizing 拉麺, though.


I live in Japan and work at a Japanese company. I can't understand Chinese at all. Even though the Japanese version of the Chinese pronunciation is somehow vaguely familiar at times, there is no making any sense of it.

I suspect Japanese folks feel the same way so I turned to my right and asked a coworker what Chinese sounds like to him. His answer was ちんぷんかんぷん (gibberish).


Just to clarify he did not say this in any sort of derogatory sense, nor do I wish it to be interpreted as such. Just that he can't understand! I think the difference is greater than the author states (except written, which is is a little easier to understand because of the borrowed characters).


> (except written, which is is a little easier to understand because of the borrowed characters)

I understood the article to be talking entirely about the written language. Despite including "and sounds like" in the title, there is barely any discussion of the spoken language.


Fun fact: on most proposed trees of human languages, Korean and English are closer to each other than Korean and Chinese.


Can you give examples of such trees? I'd be interested to look at some of these :)


They are undoubtedly referring to the highly controversial Nostratic language family hypothesis, which postulates a language superfamily that includes most of the large language families of Eurasia (i.e. Indo European, Semitic, Dravidian, Uralic) and also the proposed "Altaic" family which (also controversially) is comprised Korean, Japanese, and Turkic.

Nostratic excludes the Sino-Tibetan family (i.e. Chinese languages), most languages of Southeast Asia, and all the African languages.

https://en.wikipedia.org/wiki/Nostratic_languages

Again, it's so broad a grouping, projected so far back in time and based on such weakly detected feature similarities, that it's by no means broadly accepted as a language family in the way that Indo-European, Semitic, Uralic, etc are.


Thanks for pointing out the controversy about this.


FWIW, I think it's a super interesting hypothesis, but it just doesn't have the same level of data to back it as the accepted language families do.


this would be a somewhat representative example of such a tree:

https://www.angmohdan.com/wp-content/uploads/2014/10/languag...

You can find more very easily with a google image search on "human language tree"


I always consider that the CJKV (Vietnam) or 漢字/Kanji/Hanja (Kanji below) writing system is like CISC, and other alphabet systems are like RISC.


Slightly different perspective, but while I lived in Japan I did the JLPT as a native English speaker.

Among my friends: Koreans have no problem with Japanese grammar, struggled with writing/reading. Chinese no problem with reading/writing, struggled with grammar. English speakers.. struggled with everything but pronunciation hah.

As per Chinese characters. There are a lot of subtle differences: 'Traditional Chinese' is the original. 'Simplified Chinese' was done by the Communist party, is the standard in Mainland China. With notable exceptions in Hong Kong and Taiwan

For Japanese, there was simplifications in the 50's:

https://en.wikipedia.org/wiki/Shinjitai

But for the most part they're pretty similar to the Traditional (closer to traditional than simplified)

I'm not super familiar with Hanja(korean), but I assume they are the equivalent of 'Traditional' Chinese.

Then you also get into subtle drifts in meaning. My favourite was:

手紙 The characters mean 'hand' and 'paper' respectively.

In Japanese, it means 'Letter'

In Chinese it means 'Toilet paper'


> English speakers.. struggled with everything but pronunciation

This is interesting observation which doesn't go in par with mine. From my experience, native English speakers have really bad pronunciation as the Japanese is full of "soft" sounds like shi, chi, ji. Those are missing in English. For example when you want to say Shibuya, the first two letters should not be pronounced the same as in word "shell." The same "j" in kanji should not be pronounced as in the word "jam." I find this quite common among native English speakers.

On the other hand, I find people of central and east Europe the most gifted. At least WTR pronunciation. They got used to the most of the sounds spoken by Japanese and some more. Some subtlety aside, of course - it's not 100% the same, but close enough.


>In Japanese, it means 'Letter'

>In Chinese it means 'Toilet paper'

I immediately thought of the toilet hand from Majora's Mask:

https://www.youtube.com/watch?v=kcUXuCU2pxE

[You give him a property deed to wipe himself with]


When using the camera feature of the Google translate app, I got the idea for making something similar, but instead of translating, it would just blur out all the words that a certain person couldn't understand. That way people designing accessible spaces could get an idea of how other people experience the information in them.


Article mentions Altaic theory without mentioning that Altaic theory is utter horseshit. Still an interesting article though.


To be fair, if I recall it's only in the last 15-20 years that the Altaic theory seems to have gone from 'speculative' to 'not at all likely.' Those of us who studied this stuff 20 years ago could easily spout off this 'might-be fact' if we didn't bother to verify first.

I guess at this point similarities between the two languages are assumed to be a result of regional influence?


Many mainstream linguists have been against the hypothesis since the 1950s, but there are still plenty of linguists who contend that the Turkic, Tungusic and Mongolian families form a family; a smaller number have arguments in support of the stronger claim that the Koreanic, Ainu and Japanese–Ryukyuan languages are connected as well. See eg: [0] Vovin set out to demonstrate that the latter, stronger claim was true and ended up writing this: [1]

Arguably, the biggest blow struck to the hypothesis in the last 15-20 years came when Vovin, previously a supporter of the hypothesis, wrote "The end of the Altaic controversy"[2]. Among the more notable arguments in it is that the non-traditional methods[3] used to argue that Altaic is a family are rejected by traditional comparative linguistics because they overweight lexical comparisons.

This was a rather prescient point, as some of the methods that have developed since then that have been used to argue in favor of the Altaic hypothesis suffer from this problem quite badly; this is perhaps especially true of the Bayesian phylogenetic inference, sometimes called Bayesian phylolinguistics when applied to historical linguistics. With this technique, used in eg [0], it is pretty hard to argue that this method wouldn't overweight cognates and loan words gained through contact unless you specifically control for that, which will cause you to end up erroneously marking sprachbunds as true families.

See [4] for another usage of the technique, this time to support the Dravidian family, which is already very very well support by traditional comparative methods.

All of this ultimately points to what I think is the real answer to the issue; outside of some arguments about archaeology that I don't have enough background to evaluate [5], most of the shared bits (shared pronouns, lexical stuff) between the languages here exist because of contact.

The languages here, if they belong grouped at all, should be grouped as a sprachbund, not as a family. The possibility that there is a true family with "micro-Altaic" is small, but I think it is still a much greater possibility than the "strong" hypothesis (w/ Japanese, Korean, etc). This paper [6] has a very cool approach to evaluating the weaker form of the hypothesis that I think most HN readers would find interesting.

So to answer your question, yes, I believe your assessment is correct, and mainstream historical linguistics does as well. But I wouldn't go so far as to say there are any nails in the Altaic hypothesis' coffin just yet.

[0] https://academic.oup.com/jole/article/3/2/145/5067185

[1] https://www.academia.edu/4208284/WHY_JAPONIC_IS_NOT_DEMONSTR...

[2] https://www.academia.edu/6345901/The_end_of_the_Altaic_contr...

[3] "non-traditional" = anything outside the comparative method (https://en.wikipedia.org/wiki/Comparative_linguistics). See eg: https://www.reddit.com/r/linguistics/comments/6tg6cr/why_is_...

[4] https://pure.mpg.de/pubman/item/item_2564924_3/component/fil...

[5] https://www.cambridge.org/core/journals/evolutionary-human-s...

[6] https://www.ling.upenn.edu/~ceolin/Diachronica.pdf


For an amateur's blog post in 2009 to call Altaic theory utter horseshit would been rather hubristic to begin with, and moreover, not even the point. The Altaic hypotheses were higher-profile proposals that happened to include common origin for Japanese and Korean. Even dismissing the former, the latter is not exactly out of consideration.

And, as the post points out, the whole line of argument is a curiosity that has little concrete bearing on the fact that the two are very similar from a learner's perspective.


The Altaic theory isn't necessary to postulate a connection between Japanese and Korean. It's known that genetically the Japanese population was seeded by Korean settlers which largely displaced the native Ainu populations. Japanese is likely a descendent of the language spoken by these colonists from the Korean peninsula.

The two languages are very divergent, and the usual mechanisms for classifying languages are difficult to apply because the dominant Chinese influence came to each separately after they diverged.


> It's known that genetically the Japanese population was seeded by Korean settlers which largely displaced the native Ainu populations

It's even more complicated than that. In addition to the Ainu there were the Jomon people, a non-agricultural but nonetheless settled society who are believed to be the first people in the world to use pottery - which is surprising for a non-agricultural people.

They might in turn have been an offshoot of the first migration of modern humans out of Africa along the south coast of Asia, whose haplotypes [1] occur today at some of the highest frequences in Japan and as far north as Mongolia and as far south as Australia. Some sub-branches spread west all the way to the edge of Europe.

1. https://en.m.wikipedia.org/wiki/Haplogroup_M_(mtDNA)


I thought the latest thinking was that the modern day Ainu are descendants of the Jomon population.


I think the latest thinking is that all people of Japan to varying degrees are descended from the Jomon people along with other populations migrating from mainland Asia over millennia.

This pattern - early hunter gatherer populations forming the substratum that later mixed with larger migrations of agricultural populations - is not unique to Japan.

It's quite similar to what you find in the paleogenetics and history of most regions of the world, including Europe and South Asia.


My understanding of more recent European population studies is that the genetic signature of the pre-agrarian hunter gatherers is actually quite low. In general they were displaced rather than merged. Or their numbers were so low and diffuse that their contribution was little. Which makes sense when you consider population densities possible/typical in a hunter gatherer vs agrarian lifestyles.

We're back to population migration / replacement theories being ascendant rather than the situation 30 years ago when cultural diffusion theories were preferred.


10% of the average Briton's ancestry (for those without recently migrated ancestors) is attributable to West European Mesolithic hunter gathers. That's a pretty significant chunk, suggesting both migration and mixture happened, and not diffusion or replacement.

From https://www.nhm.ac.uk/our-science/our-work/origins-evolution...

"When we look at genetic variation in modern British people today, we find that – for those who do not have a recent history of migration – around 10% of their ancestry can be attributed to the ancient European population to which Cheddar Man belonged. This group is referred to as the western European Mesolithic hunter-gatherers. However, this ancestry does not relate specifically to Cheddar Man or the Mesolithic population of Britain. Well after Cheddar Man’s death, two large-scale prehistoric migrations into Britain produced significant population turnovers13. Both of these migrations into Britain represented westward extensions of population movements across Europe10-12. In both cases, these migrating populations intermixed with local people who carried western European Mesolithic hunter-gatherer ancestry, as they moved across Europe. When these populations arrived in Britain they already had some hunter-gatherer ancestry derived from this mixing with local populations. Therefore the majority of western European Mesolithic hunter-gatherers ancestry that we see in modern British people probably originates from populations who lived all over Europe during the Mesolithic, which was carried into Britain by these later migrations."


While a relationship between Japan and certain lost languages of the Korean peninsula is agreed upon by nearly all scholars, that still doesn’t necessarily mean that Japanese and Korean are related. One recent school of thought, for example, is that pre-Proto-Japanese entered the Korean peninsula from mainland China (Shandong province) while Korean came down from the north, so they would not be genetically related in spite of being neighbours.


I was referring to human genetics.


Thank you for this; I wish we could be friends


Ok - this chain is interesting; didn’t have this background; I deserve the downvote. Thanks for context... will go read some more


Just some context on Simplified Chinese, the origin of Simplified Chinese came before the communist years and was mainly aimed at modernizing the Chinese written language and making it easy to learn for the general population as the many strokes to write a character in Traditional Chinese was making it hard to spread knowledge and increase literacy rates among the poorer and rural populations in China.


For other languages, there's a pretty interesting youtube channel called ecolinguist that frequently tries to test mutual intelligibility of language pairs to various extents. It's actually quite fascinting.

https://youtu.be/m9Dagt3SWoo


imo Traditional written Chinese (what the Japanese call Kanji and that's used outside of China) is easier than simplified written Chinese. While hand writing traditional Chinese is harder, more complicated characters are made of other characters with a related meaning. Consequently, you can guess what the word means. Conversely with simplified, there less sub characters, so there's a lot of lost context. Easier to write by hand is also not so much of an advantage anymore given computers.

If you're going to learn to read or write, learn traditional. It's easy to read simplified, but not so great the other way around.


I'm not sure what you mean: are you claiming that...

> While hand writing traditional Chinese is harder, more complicated characters are made of other characters with a related meaning.

..."traditional" characters (which do not equal to Japanese Shinjitai!) is easier to write but harder to read, or...

> It's easy to read simplified, but not so great the other way around.

...it's easier to read but harder to write?

There is some truth in your claim: simplification does discard some context and can be harder to read. But that's only the case for irregular cases (notable example being 漢 to 汉). In many cases they are highly systematic and phono-semantic roots are preserved (e.g. 訁 to 讠, so no information is lost), or characters are so frequently used that you will have to memorize anyway (e.g. 飛 to 飞). Pronunciation-based simplification (e.g. 後 after, behind to 后 empress) is debatable, but it also tends to occur in frequently used characters.


Sorry, I didn't realize that the Japanese simplified kanji.

> ...it's easier to read but harder to write?

Harder to write because you have more lines and characters to draw. Harder to read because there are less sub-characters due to there being less lines. Judging from what you wrote, you probably understand my point just adding it just in case.

Yeah, I've been gone a long time and I've been assimilated i.e. I don't read and write often anymore so you're probably right.

> or characters are so frequently used that you will have to memorize anyway

This is why I feel it makes it harder, especially for less common words. It's just more unnecessary work.


To me, the difference almost feels like Chinese is an acoustic instrument that has more flavor in how its played and Japanese feels more like a digital instrument in how its programmed.


What abt vietnamese?


Vietnamese adopted the Latin script in the late 19th century, and Chinese characters are effectively obsolete now.

They do still use a lot of Chinese-derived vocabulary, which you can occasionally spot if you squint hard enough. Xã hội chủ nghĩa = 社会主义 she hui zhu yi (socialism).


Like Dutch to an English speaker?


I agree that 新字体 is preferable to 简体字 for exactly the etymological reason you cite, the thing with 言 only being simplified when it is a radical in 簡體字 really irritates me (especially since it is not a difficult or slow radical to write anyhow), and there are more examples than just that. For me I practice with 正體字 most often despite being more familiar with Japanese; so my preference goes 新字體 > 正體字 > 簡體字.

P.S. adding lang attributes on those spans where you compare versions of the characters would be nice, though I get that you have to choose just one of several, when a character is nearly identical in two or more countries. In my browser it also fixes an issue where despite serif being in your CSS font stack, it will select a sans-serif/gothic font by default if no lang attribute is set.


Saying Simplified Chinese makes less sense in terms of etymology while only providing a handful of examples is cherry-picking, since over two thousands characters are simplified[0].

I can provide a few examples where characters are more "etymological" in Simplified version vs in Traditional: 國 vs 国,黨 vs 党

[0]: https://web.archive.org/web/20131007231820/http://news.xinhu...


I don't see how 玉 has more etymological significance to 國 than 域/或. What analogy are you drawing on where the previous most common character for land/country is a worse etymological root than a character that apparently has meant only jade (or direct analogies to the material preciousness of jade) for longer than 國 has existed.

As for 黨, it is a direct analogy to 當 as in 當天. There is a little bit lost in the conversion of 田 to 黑 but at least they are related.


玉 was a variant of 王 (king)[0] thus king in walls (国), it's definitely NOT "only jade".

[0]: https://zh.m.wiktionary.org/zh/%E7%8E%89


Then why not just put 王 in the box rather than 玉? As far as I can tell the 王 glyph predates the invention of 玉, and the purpose of this new glyph was to distinguish 玉 from 王.

Also isn't a land within borders still a better analogy for a country than a king within walls? Walls bounding a king seems more like a palace.


Unfortunately, vocabulary in languages are defined, not derived.

Why isn't "business" a measure of how busy you are?

Why isn't "waterboarding" analogous to "snowboarding" and "sandboarding"?

(As an engineer I hate these peculiarities and I'm all for fixing them but the majority of the world tends to want to stick to the not-necessarily-logical definitions.)


Sure, I'm not saying everything has to make sense, but buddy here was claiming that 国 had cleaner etymology than 國, and to me it seems like one of these is indirect, and the other is very direct.


>Then why not just put 王 in the box rather than 玉?

I think we have done exactly that[0], it's just not part of the 1986 proposal in PRC.

[0]: https://zh.wiktionary.org/wiki/%E5%9B%AF


Interesting article, though it feels pretty unpolished, like a step above someone's train of thought.

Also odd that Catalan was chosen as the example, rather than french, latin, or German, which all have much stronger and more direct influence on English than Catalan or Spanish.

Also some odd linguistic claims. The Altaic theory is not really supported by most linguists, and is the author claiming that Japanese and Korean are incredibly easy to learn for English speakers, or that Japanese is incredibly easy to learn if you're a Korean speaker and vice versa?

Regardless, interesting blog. Would be great as a first draft to expand upon with more examples to make you imagine what it would feel like.


Anecdotally as a native English speaker, I find spanish much easier to guess the meaning versus french or german.

I believe the author means Japanese/Korean are easier to learn if you know the other, versus English which is comparatively difficult for native C/J/K speakers.

In my experience, studying Korean did make studying Japanese a lot easier. There are some shared concepts, but Japanese is still quite difficult and different from Korean. From what I can tell, my Korean friends have an easier time picking up Japanese than English.


>>* From what I can tell, my Korean friends have an easier time picking up Japanese than English.*

I think it also helps that Japanese and Korean sounds share similar sounds, where as English has sounds that are difficult to pronounce for native Korean/Japanese speakers. My mother still has a hard time pronouncing the English "V" and "Z" sounds.


yeah, that was pretty much my point. The article wasn't trying to show what language is easy to guess for K/J speakers, it was trying to show what language X sounds/reads like to speakers of languages A and B, where X was a major influence on A and B. In the article, A and B are K/J and X is Chinese. In my example, A/B is English, and X is French/German.

I study both Korean and Japanese and I agree, knowing some Korean helps with learning Japanese. My korean friends who speak both fluently say about as much as this article: it's almost a replace-in-place similarity, so that seems to line up with your account as well.


> Chinese characters (known as hanzi in Chinese, hanja in Korean, kanji in Japanese) have had a huge effect on the two languages similar to the effect of the Greco-Latin vocabulary present in English

This is simply misleading. Japan was the target of massive immigration from China. Modern Japanese people are of Chinese descent ("yayoi"), with aboriginal DNA. So it is not simply the case a writing system having an influential effect. Foreigners brought their language, mixed it with the local language and evolved a new one out of that.


I’m afraid that you misunderstand the chronology. The development of the writing system in Japan postdates the Yayoi immigration by several centuries at least. The original immigrants from mainland Asia brought no writing system with them. Japanese writing is the result of later adoption of Chinese writing by Japan’s learned elites. Any introduction to Old Japanese orthography will explain how this happened.


I'm afraid you're fishing for assertions in my comment which are not actually there. For one thing, I make no remarks about any writing system at all, let alone any specific timeline regarding the development of literacy.


Even without attributing the Chinese writing system to the immigrants from China, your post above does not contribute to the discussion. Japanese is genetically unrelated to Chinese and typologically different from it. What the Yayoi immigrants from the mainland originally brought with them bore no resemblance to the Chinese language. That the Japanese were able to use Chinese words and Chinese characters for their own purposes and some similarities between the two languages arose as the OP mentions, has absolutely nothing to do with pre-Proto-Japonic speakers coming from the mainland. All that meaningful Chinese–Japanese interaction happened many centuries later.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: