Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To me, that's a design flaw. Would we really be any worse off if we simply declared filenames must be UTF-8?

That seems to be the only case where a user-visible and user-editable field is allowed to be an arbitrary byte sequence, and its primary purpose seems to be allowing this argument to pop up on HN every month.

I've never seen any non-malicious use of it. All popular filesystems already disallow specific sets of ASCII characters in names. Any database which needs to save data in files by number has no problem using safe hex filenames.



Sure we could declare that but then what? Non-unicode filenames won't suddenly disappear. Operating systems won't suddenly enforce unicode. Filesystems will still allow non-unicode names.

Simply declaring it doesn't help anybody. In the meantime your application still needs to handle non-unicode filenames otherwise those malicious ones are free to be malicious.


I'd assume that the proper place for defining what's a valid filename would be on the filesystem level, so a filesystem of standard ABC v123 would not allow non-unicode names; so non-unicode filenames would either get refused or modified upon copying/writing them to the filesystem.

This is not new, this would match the current behavior of the OS/filesystem enforcing other character restrictions such as when writing (for example) a file name with an asterisk or colon to a FAT32 USB flash drive.


If unicode had a set of "explictly this byte" codepoints, it should be simple to deal with, just pass the invalid bytes of the filename in that way.


Unicode deals with text, so such a set of codepoints is a non-starter, anyway.


Once you lose the expectation of being able to work with non-unicode filenames, those files will quickly get renamed and cease to be a problem.


How can you rename them if you can only use unicode paths?


You would need to use some special utility created just for that purpose.


As long as the tool for renaming files handles non-utf8 filenames you'd be fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: