Codepoints do not match characters. You only need to deal with codepoints to tok...

Codepoints do not match characters. You only need to deal with codepoints to tokenize, but to find the column you need to count characters, which require a unicode database. I think this is what the parent was referring to, though I suspect a lot of compilers simply ignore this issue.