But there are tons of languages (not just CJK languages) that use either compounding or combos of root + prefix/suffix/infix to express what would be multiple words in English. E.g. German 'Schadenfreude'. Its actually way more useful to tokenize this as separate parts because e.g. 'Freude' might be part of a lot of other "words" as well. So you can share that token across a lot of words, thereby keeping the vocab compact.
Of course most indo-european languages have declensions or at least conjugation. That includes English even if it is overly simplistic there.
CJK languages do not really have that, they don't even have conjugation. They have simple suffixes at best to mark a verb as being interrogative.
Words do exist in all CJK languages and are meaningful. Tokenizing by Hangul radicals doesn't really make sense if you're not going to tokenize by decomposed letter in romance languages too (e.g. hôtel would be h, o, ^, t, e, l).
But there are tons of languages (not just CJK languages) that use either compounding or combos of root + prefix/suffix/infix to express what would be multiple words in English. E.g. German 'Schadenfreude'. Its actually way more useful to tokenize this as separate parts because e.g. 'Freude' might be part of a lot of other "words" as well. So you can share that token across a lot of words, thereby keeping the vocab compact.