Surrogates are technically a UTF-16 only thing. Realizing that sometimes they ne...

account42 · on Nov 19, 2020

Specifically, filenames on Windows are not UTF-16 (or UCS-12) but rather WTF-16 - like UTF-16 but with possibly unmatched surragate pairs. WTF-8 provides an 8-bit encoding for such filenames that matches UTF-8 wherever the original was valid UTF-16 while converting the rest in the most straightforward way possible, menaing you need less code to go from WTF-16 to WTF-8 than going from UTF-16 to UTF-8 while rejecting invalid characters.