It does not. Your code is using precomposed instead of decomposed chars.
Try:
let s = String::from("noe\u{0308}l");
println!("Reversable? {}", s.chars().rev().collect::<String>());
println!(
"First three characters? {}",
s.chars().take(3).collect::<String>()
);
you get l̈eon which (according to the article) is wrong; likewise "noe" as the first three chars, dropping the diacritic.
1. One code point: U+00EB. This is the "precomposed" form.
2. Two code points: U+0065 U+0308, aka e followed by ¨. This is the "decomposed" form, also known as a "combining sequence" since the diaeresis combines with the base character.
If your string type is a sequence of code points, then reversing a decomposed string will tear the combining sequence, and apply the diaeresis to the wrong character. Most string types are affected by this (or worse), Rust included.
The two forms get rendered identically, so you more or less need a hex editor to figure out which form you've got.
I forked your repo and switched it to decomposed (the diff looks like a noop), and now it produces the wrong output:
One could reasonably conclude that precomposed forms are just better and easier. But they're considered legacy: we can't encode every possible combining sequence into a code point, so we might as well go the other way and decompose whenever possible. That's what Normalization Form D is about.
Much obliged, thank you. I'm adding info about this to the repo, and also adding precomposed characters versus decomposed characters. Now I do see the problem you're describing and the article author is describing.
Try:
you get l̈eon which (according to the article) is wrong; likewise "noe" as the first three chars, dropping the diacritic.