My comment is about this sentence: >Perhaps my least favorite feature of Python ...

ngoldbaum · on Jan 13, 2020

POSIX APIs take bytes, generally. Python wraps these APIs to take unicode and doesn't allow you to pass bytes, even if you need to. Filenames, for example, are just bytes, and if you force them to always be valid unicode you will make it so that you can't interact with files that have names that aren't valid unicode. That's just one example.

chaosite · on Jan 13, 2020

This is false since Python 3.6.

https://docs.python.org/3/glossary.html#term-path-like-objec...

morelisp · on Jan 13, 2020

An extremely frustrating part of the Python 3 migration is how many times Python module maintainers have had to hear "oh, now it's safe to migrate." This page currently leads off with a comment saying it's been fine any time since 3.4. You say 3.6. When I was maintaining a popular Python module, I heard the same at 3.1, and 3.2. (I didn't maintain it long after that.)

joshuamorton · on Jan 13, 2020

There are very few places where the bytes/string difference matters for posix paths. Python is far from the only popular tool to assume paths must be valid unicode.

morelisp · on Jan 13, 2020

> There are very few places where the bytes/string difference matters for posix paths.

It's nothing to do with "places", points in your program, or entry points into the stdlib. It's entire about what path names you need to process, and for large classes of software you have zero control over that. If you have a path that doesn't encode properly with your LC_CTYPE, you're in for a bad time with Python 3. (Of course you won't if you control all your own path names, but then you also don't have a problem assuming and enforcing ASCII.)

People were still migrating home systems to Unicode-compatible encodings long after Py3 came out. I still find files in archives with paths in weird (and undeclared/undeclarable) encodings. Lots of people had such files; non-native English speakers were the most likely to have them.

> Python is far from the only popular tool to assume paths must be valid unicode.

It and Java are the only ones I use regularly. Java doesn't have a good reputation for playing well with the outside world, vs. Python which had been sold for years as "better shell scripts."

masklinn · on Jan 14, 2020

> There are very few places where the bytes/string difference matters for posix paths.

There’s only every single input from the system at large, no big.

joshuamorton · on Jan 14, 2020

I don't quite agree. There's lots of systems where it's always unicode, and the a lot of systems where it's always ASCII, and then some systems where stuff is weird (and should be unicode :x)

chaosite · on Jan 13, 2020

There was a different API to get this behavior since 3.4: https://www.python.org/dev/peps/pep-0428/#id39

masklinn · on Jan 13, 2020

Which means it's been true (and broken) for many many years until maintainers finally succumbed to external pressure and unbroke the API.