To me, that's a design flaw. Would we really be any worse off if we simply declared filenames must be UTF-8?
That seems to be the only case where a user-visible and user-editable field is allowed to be an arbitrary byte sequence, and its primary purpose seems to be allowing this argument to pop up on HN every month.
I've never seen any non-malicious use of it. All popular filesystems already disallow specific sets of ASCII characters in names. Any database which needs to save data in files by number has no problem using safe hex filenames.
Sure we could declare that but then what? Non-unicode filenames won't suddenly disappear. Operating systems won't suddenly enforce unicode. Filesystems will still allow non-unicode names.
Simply declaring it doesn't help anybody. In the meantime your application still needs to handle non-unicode filenames otherwise those malicious ones are free to be malicious.
I'd assume that the proper place for defining what's a valid filename would be on the filesystem level, so a filesystem of standard ABC v123 would not allow non-unicode names; so non-unicode filenames would either get refused or modified upon copying/writing them to the filesystem.
This is not new, this would match the current behavior of the OS/filesystem enforcing other character restrictions such as when writing (for example) a file name with an asterisk or colon to a FAT32 USB flash drive.
That seems to be the only case where a user-visible and user-editable field is allowed to be an arbitrary byte sequence, and its primary purpose seems to be allowing this argument to pop up on HN every month.
I've never seen any non-malicious use of it. All popular filesystems already disallow specific sets of ASCII characters in names. Any database which needs to save data in files by number has no problem using safe hex filenames.