I'm fluent in Japanese and speak some Mandarin Chinese as well. These 3 characte...

toufka · on March 17, 2015

This seems again to be a perfect place for rendering rather than encoding. The english letter 'a' can be rendered as a ring with a tail (the way I handwrite), or a ring with a cap and a tail (the way the font usually renders). Both are the same letter, if rendered differently based on my (contextually sensitive) font.

ptaipale · on March 17, 2015

I think not. I do want to be able to say both People's Republic of China 中华人民共和国 and Republic of China 中華民國 in the same text, and if I had to choose rendering either 国+华 or 國+華 then it wouldn't work.

toufka · on March 17, 2015

Curious, as I'm not sure when that would actually happen in real life (in Chinese). Generally in mainland China , the ROC would always be rendered with 国, even officially [1]. And in Taiwan the PRC would be rendered with 國 [2].

It gets a bit weirder in Japanese where the word is distinctly not the same - one is a traditional version (proper noun) of the other and you could imagine a text using both (William vs Wilhelm vs Will).

[1] http://baike.baidu.com/view/2200.htm [2] https://www.google.com.tw/?gws_rd=ssl#q=%E4%B8%AD%E5%8D%8E%E...

ptaipale · on March 18, 2015

That is true, mainland China writes "中华民国".

But I still do want to be able to write texts that are like this discussion: mainly in English, but contain fragments in Chinese, and so that I can use both the traditional and simplified characters.

And it also makes total sense to me that 日本 is Japan, both in Japanese and Chinese, using the exact same Unicode characters.

Navarr · on March 31, 2015

If it's up to font rendering, you would specify the language tag for each of those individually which would render the appropriate font (as you are writing that country's name in its language).

Though that's far more difficult for the laman, and markdown certainly doesn't seem to have any language markers.

zhemao · on March 17, 2015

Except you can still recognize the 'a' as 'a' no matter which way it is rendered.

Not so with Chinese characters. For instance, the character for "fly" in simplified (飞) and traditional (飛) look very different. Someone who only learned simplified may not recognize the traditional character as being the same.

kijin · on March 18, 2015

Which is exactly why 飞 and 飛 are encoded separately. I don't see any problem with that.

zhemao · on March 18, 2015

Yes, but other characters that also look different are merged. Here's an example: http://www.tofugu.com/2012/04/04/the-sorry-state-of-japanese...

That's the character for "cold". If you showed me (a Chinese speaker) the Japanese or Korean variant, I would have no idea what it meant.

snogglethorpe · on March 18, 2015

As far as I can see, Unicode has mostly settled down into a sort of "good enough" state: characters that have sufficiently different renderings have gotten separate "variant" codepoints for each rendering, while characters that are very similar (even if not completely identical as commonly written) are still only present as unified codepoints.

I've no idea if these variant codepoints are actually supposed to show up in user files, or are intended mainly for the use of font rendering systems, etc... the whole thing seems a bit of a mess, even if the information is technically present.

Judging from unicode.com, "冷" does seem have separate codepoints: 冷 (chinese/unified), and 冷 (japanese z-variant). However my browser renders both as similar characters. Similarly, on my phone, the same character gets input whether using a Chinese or a Japanese input method, and both get rendered using the Japanese rendering (it's a Japanese phone) which makes Chinese text look a little funny.

An interesting example is "晩" / "晚", which has one more stroke in the Japanese variant, but it's situated in a location which makes both variants look pretty much identical (and in small bitmapped fonts, they are identical). Nonetheless, Unicode includes codepoints for both...

frivoal · on March 18, 2015

The fact that some characters are debatable does not change the fact that han unification is a good idea. In a few cases you can disagree, but not unifying at all would be madness.

As for this particular character (cold), both variants are familiar to Japanese readers, with the one described as Japanese in your link being the one you'd typically see in print, while the other one is common in handwriting, and nobody in Japan would treat these two as different. From a Japanese point of view, this is definitely the kind of thing you change by switching fonts.

This pdf is the official list of basic chinese characters, published by the Japanese governement. Look on page 9, it shows both variants of this characters in hand writing.

http://www.bunka.go.jp/kokugo_nihongo/pdf/jouyoukanjihyou_h2...

The fact that one of the variants in not familiar to Chinese readers complicates the issue, but there are at least reasonable reasons to argue that this is one, not two, characters.

I think it is possible to argue that han unification was not done very well and that the UC made too many classification mistakes (although I personally think it is generally not that bad), but I don't think arguing that unification is a bad thing entirely has legs.

anon4 · on March 18, 2015

Encoding those as the same code point makes sense. On the other hand, just the codepoint is obviously not enough on its own for rendering the glyph. The Unicode consortium seem not to care about the actual rendering part of the whole stack and are happy just defining the low-level bits. But then why do we have skin colour coding for emoji and no language coding for CJK glyphs? The entire thing is a mess, but heaping another pile of standards on top of it will make it even more of a mess, I'm afraid.