Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I imagine you can always expand your token count, but then is that very different from syllabic alphabets (that encode multiple syllables in one character)?

Granted, that doesn't apply to Chinese, which encoders concepts, so that's interesting to see in LLMs.




I think so, because even if you're encoding multiple syllables, that's just the pronounciation.

I've learned languages with alphabets and glyph scripts like Chinese, and *what I don't like about alphabet languages like English is that the letters themselves provide little context to their meaning, although you could learn the latin root words and guess from there. Of course you know how to say it, but you don't know what it means. With Chinese, you might guess what it means, but you don't know how to say it.* The characters have more meaning in them, although this isn't the case for every word, which is something non-speakers assume.

In short, from my experience learning Chinese was way easier than learning Vietnamese, which have historical ties together since Vietnamese used a fork of Chinese script until the early 1900s when they transitioned to latin script in order to improve literacy rates. Sure, it improved literacy rates because to read you only need to know how to pronounce, but it doesn't mean glyph languages are harder to learn the meaning/semantics.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: