Hacker News new | past | comments | ask | show | jobs | submit login
PSF Hires PyPI Safety and Security Engineer (pyfound.blogspot.com)
65 points by miketheman on Aug 4, 2023 | hide | past | favorite | 34 comments



PyPI is just one gear in the quite broken Python packaging mess. But I hope it’s a good starting point!


As someone that has dealt with PSF and PyPI at a product level for packaging at the intersection of Python and AI for compute hardware, we consider PyPI an immature ecosystem. It's absolutely great for python libraries, but that's it. It's not ready for the flood of AI hardware that is coming. Mojo looks promising, but not sure how it's going to interact with PyPI.


I'm waiting for the day when setuptools and pip are a thing of the past.


Military vehicle rolls through the desert, US soldiers in the desert wind, IED hits the vehicle, guitar solo starts playing...

Waiting for the one...

More guitar solo, then drums

day that never comes


What would you expect to replace pip?


Another layer on top of another layer, on top of another layer, that ultimately uses pip. Of course!

People want to replace pip with poetry, despite poetry being just an extra layer of complication on top of pip.


Nah we just want some security


The guy is a chair at the Women's Flat Track Roller Derby Association and previously worked at a cannabis company. Interesting career change.


Software's software ;)


Please be vendor neutral and don't launch something that is GitHub only and centralized and faun over GitHub's security. https://github.com/python-poetry/poetry/issues/7940#issuecom... https://news.ycombinator.com/item?id=35646436


I wrote a few scripts years ago to automate the creation of packages, name them something similar to a popular project (typo squatting), and then verify them by creating a new pypi user with a new email address. The payload was just an http request to a server I ran.

Submitted this to pycon, and I let the PSF know. Neither were interested. Haven’t trusted a single Python package since.


What measures would you suggest to prevent people from doing that?


Better email verification (at the time there wasn’t even a captcha to solve), detection of typo squatting (pip install djanho or pip install jango), limiting number of packages that can be registered in a time period by IP and/or user, prompting the user that they’re installing a package with a single Levenshtein distance away from a top 100 package. It feels crazy that I can setup a mine field of packages that trick people into thinking they’ve installed the correct packages. It just takes one typo or mistake to install a package that runs an arbitrary potentially malicious script without warning.

https://github.com/alexk307/pypi


There are measures against typo squatting now: I've attempted to publish packages and had the initial upload rejected because they are too close to the name of an existing package.


My wishlist: a mirror metadata service like Yarn to force pypi to innovate more. Pip is slow because it doesn't store the dependency metadata and has to download all upstream dependencies during the resolution phase. They were too busy wasting time on minor issues like domain squatting than core dependency resolution concerns. I also want to see Python support simultaneously conflicting diamond dependencies like Npm and Cargo. Python doesn't even support shading. One old dependency and your whole build breaks, with no recourse aside from forking upstream.


> Pip is slow because it doesn't store the dependency metadata and has to download all upstream dependencies during the resolution phase.

This isn’t really true and hasn’t been for a long time. It relies on package authors to include the dependency metadata, but pypi has published it and pip has used it going back to at least 2017, if not earlier. They could do more to enforce it, but there’s a lot more to the story of why pip is slow than this.


Your first request is granted: https://peps.python.org/pep-0658/


If your software requires 2 different versions of the same library, there is something deeply wrong with it.


It's fairly common in the compilers/tools world to have automated workflows to test against different tools and libraries.


And they are all needed at the same time?


Yes, you can't keep installing and uninstalling stuff to run tests.


Most of the time diamond dependencies are upstream, not because the software itself requires two versions.


Role mentions security. Candidate shows no experience with security.

It would've been weird... but then you remember it's Python. It's by amateurs for amateurs. Well, god speed and god bless. Who knows, maybe despite the counter-indications something good will come out of it.


"By amateurs" is a really bizarre label to hit Python with.


You must have never looked at CPython code, nor have you ever read the documentation published with Python, nor have you ever read any PEPs, nor have you ever been on the core-dev mailing list / have read its archives.

It would've been really funny how so many people are using this bizarre nonsense, if it weren't for the fact that me and you might also have to use it, directly or indirectly.


I've done every one of those things. Were you under the impression that other language development is less messy, as opposed to just less public?


Erlang and Java both have a much better implemented interpreter. I mean, so much better, CPython in comparison looks like some student with very little knowledge of C wrote it. The code makes no attempt at const-correctness, no attempt at minimizing dynamic memory allocation, no attempt at optimizing structures layout... they aren't professional C programmers. They get by by writing crappy code that works somehow. They aren't even in a competition for the best code.

Library code in Python is ridiculously bad. By now, it's probably a tradition that instead of solutions it offers workarounds. There's no commonality in how interfaces are structured, things named, arguments to functions are accepted etc. It's a zoo of low quality randomly assembled code.

Documentation is also written by people who never cared to be consistent. But not only does it contradict itself between subjects, often the same subject would have plenty of self-contradictory statements simply because the author doesn't know how to think clearly, doesn't know how to express themselves, simply don't understand what they are doing. Every time you read something in documentation, you have to do mental tricks to say "well, it probably doesn't really work like this" or "well, he doesn't really mean it this way" and so on.

But, and most importantly: there is no plan, no goal. It's all about randomly adding things and permuting existing ones. There's no reason why new features are introduced. There's no global picture. When stuff gets added, there's no effort to make sure it works with other existing stuff beyond absolute minimum. The language has no taste, no flavor. It's not trying to be the best at anything. It just floats around being the tenth best at best on every metric.


From my experience, core python development is much more professional and mature than drama ridden node, rust , nim


Having done plenty of work on PyPI, including lots of security work: Mike is an extremely competent engineer, and has been a significant force[1] in PyPI's overall modernization and maintenance efforts. That kind of work is security work, and it's frequently thankless.

[1]: https://github.com/pypi/warehouse/commits?author=miketheman


> has been a significant force[1] in PyPI's overall modernization and maintenance efforts.

PyPI is a pile of hot garbage. Being one of people contributing to it is a negative in my book. That's definitely not anything to brag about.

How's that a security work again? What's even the connection?


in the name of supply chain security, i just want verified package signatures (cosign, not the extant unused gpg), the new passwordless publication is good step towards (get humans and static credentials out of pushing assets). actually one more minor, support for poetry in pip-audit.. https://github.com/pypa/pip-audit/issues/84


Where is the root of trust for package signatures? Who is verifying signatures: the package index or end-users? How do you distribute public keys? PGP is mostly maligned because of its support for old cryptography standards, some needless cruft, and especially the poor usability of its defacto standardized implementation in GPG, but cosign by itself doesn't actually make any of the trust questions I mentioned go away. There are major tradeoffs to be made about who-trusts-who and what that actually means in terms of security beyond just theatre. I'm not convinced that there exists a good trust mechanism that a package index can enforce that actually moves the needle on supply chain security.


closer to TLS CA setup with ephemeral certificates, a public log of issuance, then pgp individual trust circles and semi static keys.

https://www.sigstore.dev/how-it-works


FWIW: We're unlikely to support Poetry directly in pip-audit (I said as much in that issue, but it's a little buried). Instead we'll probably devolve the auditing "core" of pip-audit into its own library, which the poetry folks can then use, if they'd like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: