"""
While current LLMs with BPE vocabularies lack
direct access to a token’s characters, they perform
well on some tasks requiring this information, but
perform poorly on others. The models seem to
understand the composition of their tokens in direct probing, but mostly fail to understand the concept of orthographic similarity. Their performance
on text manipulation tasks at the character level
lags far behind their performance at the word level.
LLM developers currently apply no methods which
specifically address these issues (to our knowledge), and so we recommend more research to
better master orthography. Character-level models
are a promising direction. With instruction tuning, they might provide a solution to many of the
shortcomings exposed by our CUTE benchmark
"""
What character level task does it say is no problem for multi-char token models ?
What kind of tasks does it say they do poorly at ?
Seems they agree with me, not you.
But hey, if you tried spelling vs counting for yourself you already know that.
You should swap your brain out for GPT-1. It'd be an upgrade.