As someone that has dealt with PSF and PyPI at a product level for packaging at the intersection of Python and AI for compute hardware, we consider PyPI an immature ecosystem. It's absolutely great for python libraries, but that's it. It's not ready for the flood of AI hardware that is coming. Mojo looks promising, but not sure how it's going to interact with PyPI.
I wrote a few scripts years ago to automate the creation of packages, name them something similar to a popular project (typo squatting), and then verify them by creating a new pypi user with a new email address. The payload was just an http request to a server I ran.
Submitted this to pycon, and I let the PSF know. Neither were interested. Haven’t trusted a single Python package since.
Better email verification (at the time there wasn’t even a captcha to solve), detection of typo squatting (pip install djanho or pip install jango), limiting number of packages that can be registered in a time period by IP and/or user, prompting the user that they’re installing a package with a single Levenshtein distance away from a top 100 package. It feels crazy that I can setup a mine field of packages that trick people into thinking they’ve installed the correct packages. It just takes one typo or mistake to install a package that runs an arbitrary potentially malicious script without warning.
There are measures against typo squatting now: I've attempted to publish packages and had the initial upload rejected because they are too close to the name of an existing package.
My wishlist: a mirror metadata service like Yarn to force pypi to innovate more. Pip is slow because it doesn't store the dependency metadata and has to download all upstream dependencies during the resolution phase. They were too busy wasting time on minor issues like domain squatting than core dependency resolution concerns. I also want to see Python support simultaneously conflicting diamond dependencies like Npm and Cargo. Python doesn't even support shading. One old dependency and your whole build breaks, with no recourse aside from forking upstream.
> Pip is slow because it doesn't store the dependency metadata and has to download all upstream dependencies during the resolution phase.
This isn’t really true and hasn’t been for a long time. It relies on package authors to include the dependency metadata, but pypi has published it and pip has used it going back to at least 2017, if not earlier. They could do more to enforce it, but there’s a lot more to the story of why pip is slow than this.
Role mentions security. Candidate shows no experience with security.
It would've been weird... but then you remember it's Python. It's by amateurs for amateurs. Well, god speed and god bless. Who knows, maybe despite the counter-indications something good will come out of it.
You must have never looked at CPython code, nor have you ever read the documentation published with Python, nor have you ever read any PEPs, nor have you ever been on the core-dev mailing list / have read its archives.
It would've been really funny how so many people are using this bizarre nonsense, if it weren't for the fact that me and you might also have to use it, directly or indirectly.
Erlang and Java both have a much better implemented interpreter. I mean, so much better, CPython in comparison looks like some student with very little knowledge of C wrote it. The code makes no attempt at const-correctness, no attempt at minimizing dynamic memory allocation, no attempt at optimizing structures layout... they aren't professional C programmers. They get by by writing crappy code that works somehow. They aren't even in a competition for the best code.
Library code in Python is ridiculously bad. By now, it's probably a tradition that instead of solutions it offers workarounds. There's no commonality in how interfaces are structured, things named, arguments to functions are accepted etc. It's a zoo of low quality randomly assembled code.
Documentation is also written by people who never cared to be consistent. But not only does it contradict itself between subjects, often the same subject would have plenty of self-contradictory statements simply because the author doesn't know how to think clearly, doesn't know how to express themselves, simply don't understand what they are doing. Every time you read something in documentation, you have to do mental tricks to say "well, it probably doesn't really work like this" or "well, he doesn't really mean it this way" and so on.
But, and most importantly: there is no plan, no goal. It's all about randomly adding things and permuting existing ones. There's no reason why new features are introduced. There's no global picture. When stuff gets added, there's no effort to make sure it works with other existing stuff beyond absolute minimum. The language has no taste, no flavor. It's not trying to be the best at anything. It just floats around being the tenth best at best on every metric.
Having done plenty of work on PyPI, including lots of security work: Mike is an extremely competent engineer, and has been a significant force[1] in PyPI's overall modernization and maintenance efforts. That kind of work is security work, and it's frequently thankless.
in the name of supply chain security, i just want verified package signatures (cosign, not the extant unused gpg), the new passwordless publication is good step towards (get humans and static credentials out of pushing assets). actually one more minor, support for poetry in pip-audit.. https://github.com/pypa/pip-audit/issues/84
Where is the root of trust for package signatures? Who is verifying signatures: the package index or end-users? How do you distribute public keys? PGP is mostly maligned because of its support for old cryptography standards, some needless cruft, and especially the poor usability of its defacto standardized implementation in GPG, but cosign by itself doesn't actually make any of the trust questions I mentioned go away. There are major tradeoffs to be made about who-trusts-who and what that actually means in terms of security beyond just theatre. I'm not convinced that there exists a good trust mechanism that a package index can enforce that actually moves the needle on supply chain security.
FWIW: We're unlikely to support Poetry directly in pip-audit (I said as much in that issue, but it's a little buried). Instead we'll probably devolve the auditing "core" of pip-audit into its own library, which the poetry folks can then use, if they'd like.