Codepoints do not match characters. You only need to deal with codepoints to tokenize, but to find the column you need to count characters, which require a unicode database. I think this is what the parent was referring to, though I suspect a lot of compilers simply ignore this issue.