Vram is dependent mostly on the parameter count, not on the number of input/outp...

Vram is dependent mostly on the parameter count, not on the number of input/output tokens, if I remember correctly.

Also, in case of english language, it's one token per one word (more or less), so it's the same as in Chinese - assuming both LLM tokenisers were geared towards their native language.

The only issue is when you have an tokeniser geared towards western languages, and you try to to use the same tokeniser on a different group of languages - then a single word in a language foreign to the tokeniser would have multiple tokens.

But that has nothing to do with the underlying structure of the language.

In other words - you wouldn't really see a difference between an input in chinese compared to english after the text gets tokenised. It's rougly the same amount of tokens, and the underlying parameter count would be also similar.