From on-screen to in-memory representation, we go from glyphs to grapheme clusters, to unicode 'characters', to codepoints, to encoded bytes. None of these steps are bijections (ligatures, multi-character graphemes, invalid characters, encoding errors).
I'd argue a 'proper' string type should operate at the grapheme cluster and/or character level and take care of things like normalization (eg for string comparisons) and validation.