Thank you for taking the time to write this response. Unfortunately, even though I agree that tokenization makes it pretty hard for the LLM to count characters, I'm still not convinced that it is a fundamental problem for doing so. I think the lack of (or limited amount of) symbolic processing is an even more important factor.
> But there is no way it can count characters in tokens, it just doesn't see them.
If that is the case, then how can most LLMs (tested with ChatGPT and Llama 3) spell out words correctly?
> But there is no way it can count characters in tokens, it just doesn't see them.
If that is the case, then how can most LLMs (tested with ChatGPT and Llama 3) spell out words correctly?