With the move to pyproject.toml in 2022[0], Poetry has become our goto method.
With the lock file being default, we don't worry about different installs in different envs.
Having come from the Rails world, Bundler system was solved for... a decade? So I was surprised it was such a mess in Python until so recently.
At the core, the thing that makes Poetry and Bundler system so predictable is 1, lock file. 2, ability to install different versions in diff locations and referencing the version you need to load. Each alone isn't enough.
npm had the same problem pip suffered from, you may have a version installed different what the req.txt, proj.js or even lockfile says but because it exists inside the location, it gets loaded. It wasn't until yarn2 did node_modules finally get moved out such that side-by-side versions wasn't awkward.
[EDIT]
If you're not using Poetry + Docker for deployment yet, I 100% recommend it as the "boring" method.
RUN curl -sSL https://install.python-poetry.org | python -
RUN /root/.local/bin/poetry config virtualenvs.create false
COPY poetry.lock pyproject.toml ./
RUN --mount=type=secret,id=gh_priv_key,target=/root/.ssh/id_rsa \
/root/.local/bin/poetry install --without dev --no-root
I'd say this is more like pytorch is horribly packaged and a nightmare as an eng user to integrate into codebases. All of their assumptions are you use conda and are a datascientist one time doing something in an ipython notebook. Very little consideration is made to how you deploy at scale reliably. It's not really on poetry.
While this is on the one hand at least partially true, it is also the case that as long a poetry can't deal with these (and similar) cases then it cannot really be said to be a candidate for the 'default' dependency management tool. Having people say 'just use poetry' as the go to answer is very bad default advice as long as poetry will fail in a number of important cases.
Based on this comment 5 days ago[0], it's working? I'm not sure didn't dig in too far but based on that comment it seems fair to say that it's not fully Poetry's fault because torch removed hashes (which poetry needs to be effective) for a while only recently adding it back in.
Not sure where I would stand if I fully investigated it tho.
The fact that it only kind of sort of incidentally works some times shows clearly that people who consider PyTorch a vital library for their work aren't the target audience for poetry. Another use case they don't really care about are people for whom compiling C++ etc. is important part of their build process.
Poetry is great if you're developing the sort of software that the poetry devs priorities, but since poetry is only targeting a subset of the python community it will never be a good default tool for all python development, and thus the fragmentation continues.
Every comment in that thread, including the comment you referenced, is wrong.
Due to Poetry's architecture, it can't satisfy all three at the same time:
- the platonic ideal of build isolation and lockfiles
- installing the appropriate accelerator specific version of pytorch for the current platform non-interactively or one that the user selects
- dependencies for pytorch that are transitively compatible with the way dependencies for pytorch are expressed elsewhere
This is something you can achieve with setuptools and setup.py by forfeiting the platonic ideals of build isolation and lockfiles.
Poetry, on the other hand, does not let you choose which lamb to sacrifice. Everyone in the thread, for the last two years, who has reported that they have had some success are misunderstanding the state of their install, and have interacted with flaws in all three situations I'm describing. They have something that will not correctly install anything that is dependent itself on PyTorch, which is useless, since everything in the PyTorch ecosystem is is dependent on it, and the main workaround the community uses - installing torch first, followed by installing dependencies from a requirements.txt, followed by copying a dump of scripts - is not compatible with poetry.
2/3rds of Python end users do not engage with packaging at all. pyproject.toml with dependencies is about 1.5% of the ecosystem. It provides only downsides compared to setup.py and pinning your package versions by commit in your setup.py dependencies aka doing what golang does, and this does not require any external tools in Python. In my opinion, the Poetry developers need to fix pytorch or they will not get adoption during Peak Python.
Sorry if my questions are maybe not well informed (maybe I should delve more into Poetry, I don't really know it), but I don't really understand those points, but I would like to understand them.
What does it mean, build isolation? And how does PyTorch violate that? Or why can't we install PyTorch with build isolation?
What is the issue with lockfiles?
I can understand there might be issues if you need to install a custom accelerator-specific version of PyTorch (although also I don't fully understand what the issues are about this). However, what's with just the standard version, which is suitable for most people, as it comes with CUDA support?
What do you mean by non-interactively? pip install is non-interactively, or not?
What issues are there with the dependencies for PyTorch? Why are they not compatible with the dependencies expressed elsewhere?
How are dependencies defined by requirements.txt different than dependencies defined by pyproject.toml or by setup.py?
If your answer is RTFM, maybe you can point me to some resources. I read a bit through Poetry documentation, but I still don't really understand those issues.
The issue isn't poetry per se, but what simplifying assumptions (that there is a single solution and you can compute it on any system, and that "system effects" can be ignored/handled by wheels, see https://pypackaging-native.github.io/ which covers some of this) it tries to make to solve the solution of what to install.
"Build isolation" is all about limiting what is available at build time (this generally conflicts with inspecting the environment/system), and is vaguely (but no where near sufficient) for reproducible builds (see https://reproducible-builds.org/, which came from linux distros). Lockfiles are similarly an issue because they try to assume they can cover all cases (which they cannot, see e.g. various discussions about sdists for the lockfile PEP). Both these conflict with trying to work out what works for the current system (which is what pytorch tries to do because there are so many different options), or with trying to control what is discovered when trying to do the build on one system but run on another.
But, it does seem to go off and download many GBs of pytorch packages for all possible python, architecture and OS versions which takes ages. Feels very broken at least.
I agree with your posts(s), but conda is different to pyenv, poetry and similar tools (rye, hatch etc.) in that unlike the others which only manage Python, it (tries) to manage your whole environment. Can the others manage R packages (because your using r2py or reticulate)? What about installing pandoc? Or latex?
I suspect we're going to continue in having churn in what people seem to be calling project/environment managers simply because there's a bunch of possible workflows (given the various possible deployment scenarios), and there's enough library support that starting your own is a weekend project (and then all the corner cases are discovered and you make a choice that works for some but not others and the cycle begins anew). conda seems to have dominated its niche (I haven't seen activepython nor enthought in years), but I suspect that part of the ecosystem is more naturally stable.
We use poetry inside docker in a very similar way, but use a multistage docker file - the "builder" image has poetry installed and installs the application into a venv, we then copy that venv to the "prod" image and run the application there (so we do not have poetry or any other packages that are only required at build time in the final image).
It's definitely fine with no problems. We used to do this too.
But, it optimizes for something that only matters if you're in a space constrained setup. Turns out for "almost-all" web setups, a few extra hundreds of MBs don't matter even a few GB possibly.
Even in AWS Lambda where cold-start is king, size doesn't matter[0] - I am the top answer. As long as you have fast starts and that's determined by code paths, libraries etc.
So as long as your program readies itself quickly, having gcc and a few other std libs don't impact any key metric.
In there, you'll find the settings.py split by environment (like you would have in the Elixir world), and a multistage Docker image designed to be cache-friendly (for faster Docker builds). Some stages even run the test suite, so it could be compared to a CI pipeline.
Not having build dependencies in the production image is definitely a good approach.
Poetry solves a very real problem, but I kinda hate it. It's not a pleasant tool to use.
Allow me to out-bore you, because there's way to much excitement in your solution. We just build Debian packages and deploy our applications that way. The Debian packaging infrastructure could be better, but mostly it just wraps around setuptools/pyproject. You need to rebuild your containers every time there's a security update to package, we just set apt to auto-update.
Notably Poetry lockfiles are farrrrrr better cross-env then Pipenv's, which if you lock on an ARM Mac, will not install on a x64 Linux box. Poetry just works.
Not a huge fan of poetry since it does not comply with standards (PEP621 PEP518...). But it has a great UX compared to alternatives. My go-to has been PDM: which is basically PEP-compliant poetry. But it is being maintained by one dev only and support for containers is not great...
Very tangential, but wanted to share a small QoL trick that makes using poetry (and venv in general) much nicer...
Instead of having to type "poetry shell" to activate the virtualenv, I use a tool called direnv that automatically modifies envars when you enter a directory - undoing the changes when you leave.
In my direnv config (.envrc) for the directory I write:
PATH_add .venv/bin
To add all the virtualenv executables to PATH. Now I can just call "python/pip" directly and never have to worry if I'm inside my virtualenv or not.
N.B. for this you need to update poetry config to ensure virtualenvs are placed in the project directory vs. the default centralised location
I personally intensely dislike tools that change the environment based on the directory contents. I use an `exec` script in the root (or parent, sometimes) that just does this:
That gives me a subshell with the virtual env active, and the path to the virtual env goes into $PS1 so I've got a very visible "you are working in this project" signifier.
I don't know why subshells aren't more popular for this sort of thing, they remove a lot of ambiguity (and subsequent opportunities to screw things up) for me.
Also coming from Rails, besides the way pip, etc. works, I was also surprised at just how fragmented the landscape is. And with a lot of strong opinions on how that person’s viewpoint is correct. When I was figuring this stuff out it was very confusing because of course these strongly held opinions neglect to list the huge downsides. I think it’s an interesting note about the culture of Python.
YES!! 100%, I emphatically CANNOT stand how terrible model validations are in Django (obvious vs Rails) and they are a direct result of "we do it better".
full_clean vs Form.full_clean vs DRF. It's just an absolute shit show.
Even if it is better, it's just annoying and not worth the benefits.
For us the overall benefits outweigh the drawbacks but I get angry every time when I need to do a model validation.
Hard agree. This article makes good points but I feel like Poetry just simplifies so much of it and makes it so much less manual. We’ve used it at $DAYJOB for months and it’s been awesome. In the same sense as the article — it’s been sufficiently boring to rarely be the source of problems, and when it is, it’s almost always our fault rather than a design flaw in Poetry.
What does Poetry provide that the 'official' PyPA tooling doesn't? I'm referring to the somewhat unfortunately named 'wheel' and 'build'? Also, `pyproject.toml` itself is an innovation from outside of Poetry afaik.
And for what use-cases does Poetry not work, if any?
The python core team endorses pip, but pip solves something like 80% of the problem - we need them to endorse the other 20% of the solution.
I've been using pip-tools for some time, I think it solves the other 20% of the problem for me in a simple way that I like. Poetry et al seem to be trying to do too much - ymmv.
The iterations on packaging that don't really seem to ever get it right are, I think, frustrating to the community where the core likes to advertise a "Zen of Python" "one way to do things" mantra, but can never really get 100% of the packaging problem sorted out in a clean way in spite of several communities seeming to figure it out.
The core team endorses everything, not only pip, but good luck finding an example setup.py with example repositories from any of the documentation pages you find on Google. It’s results 1-3 and is always marked Obsolete.
The communication on what to do is a disaster. Look at non obsolete https://packaging.python.org/en/latest/tutorials/packaging-p... - my dude, their pyproject.toml doesn’t even specify a dependencies section, which is 99% of the value proposition of packaging.
I have been impressed that they're making progress though - wheels solve some problems conda used to solve; so doing ML stuff that is based on conda I can usually just use regular pip packages now which is very nice.
I like that pip-tools is a separate thing. Most Python packages don't need and shouldn't use a requirements.txt file. It makes more sense for npm to have it by default, but not Python.
I kinda see myself wanting the pip-tools flow, requirements.in -> pip compile -> requirements.txt thing perhaps integrated into pip itself. Seems like the simplest of the solutions out there for good dependency declarations and pinning? I may well be wrong.
It is, but it doesn't really bother me that it's not part of pip itself. On the other hand, I wouldn't be opposed to it being part of pip and "blessed" in that way.
For me it feels like people have always been over-engineering dependency management where the practical issues from not doing it are pretty much non-existent.
My approach is to just use Docker, no virtualenvs. I get that you might run into the multiple interpreters issue in theory but across multiple projects in the past 5 years I haven't seen that. Also, this might no longer be true but avoid using Alpine. If you're deploying Django there is no reason to optimize image size and Alpine has a lot of things which are missing (i.e. at least a couple of years ago, wheels where not supported leading very slow build times).
I only do a single requirements.txt. Anything which makes your prod and local environment differ is a bad thing in my opinion. Fine black might make my image a couple of mbs larger but why would it matter? On the other hand attempting to figure out why something which works on my machine does not work on prod is always a nightmare.
Setting requirements as a range in requirements.txt allows me to automatically get bugfixes without spending time on it (e.g. django>=4.2.3,<4.2.99
django-ninja>=1.0.1,<1.0.99) Again, I might have run into 1-2 issues over the past couple of years from this and I've saved a lot of time.
Getting a project running locally should not take more than 1 minute (a couple of .env vars + docker-compose up -d should be enough).
The biggest practical issue in dependency management in python is dependencies not pinning their dependencies correctly.
Anything which makes your prod and local environment differ is a bad thing in my opinion
Unless you're writing code that only you will deploy to machines that you control, "prod" and "local" will always be different. If you're only targeting a fixed version of a fixed OS on a fixed architecture, then most things are easy.
For me "local" is a Mac running ARM, for the person pip installing my tool "prod" might be Linux or Windows. I cannot punt (or I can, but it would greatly diminish the usefulness of the stuff I develop) and say "your prod must equal my local or it won't work", I have to deal with it and I want tools that make this hard problem as easy as possible.
5. To add deps, add them to pyproject.toml and repeat step 4. Do not pip install deps directly. Do not pin deps to any particular version, but if you have to you can add constraints like >=5 (I need a feature introduced in v5).
6. If you are writing a package to be pip installed by others then you're done. Read setuptools docs for how to build etc.
7. If you also want to build an environment to run your code (e.g. docker image for deployment or serverless deployment etc) use pip-tools to pin your dependencies. (This is the only reason you need requirements.txt).
8. For test dependencies (e.g. pytest) or dev dependencies (e.g. test server) leverage optional dependencies in the pyproject.toml file. This plays very nicely with tools like tox, which you should use. Use pre-commit for linting etc.
I've landed on Poetry after having tried many different options over the years. Being able to specify my relatively open dependencies (Eg "django==4.0.*") but having the exact version of every subdependency versioned has proved to be very reliable and reproducable. Docker multistage builds allow me to ship a production container without Poetry installed.
Note that you can also accomplish the same with pip using the tilde operator in your requirements.txt (e.g. Django~=4.0), and Constraints Files [1] for the subdependencies.
A multistage build is still recommended as building your dependencies might need gcc or other tools.
I’m happy others are writing on this subject! I appreciate your enthusiasm for trying to do the most “basic” things in Python. While I personally enjoy a bit more management with tools like Poetry, I believe all Python programmers should know how pip and setuptools work before trying their supersets.
Thanks for this. It's exactly the the format and depth I wanted.
I haven't been able to muster the time or energy to start digging into the quagmire that is the Python ecosystem but this seems like the perfect place to start (and hopefully stay for a while.)
> This has been solved by many other solutions (pipenv, poetry, pdm, etc...).
It's been repeatedly not solved by the new tool the Python ecosystem comes up with every few years, IME. (It reminds me of an old quote I can't find about how every new version of C++ contains new features to fix the problems with the new features in the previous version of C++)
I haven’t tested this yet but what’s better about pipfile.lock over whatever pip-compile spits out? It sounds like both are exact versions of packages, no?
Pipfile.lock is broken if you have wheels wrapping compiled code as it captures the arch in the lock file! Poetry doesn't do this, so you can lock on your M2 Mac and install on x86 fine.
Pip-compile in the most common use just creates a requirements.txt with everything pinned to a given version.
I think you can do hash stuff with it, but haven't used that part.
Eh? We're locking the version of the dependency, we don't need to look the particular compiled version of it, because they only differ in which architecture they were compiled for. We want 3.2.0 of dep_x on ARM and on x86_64, the last thing we want is running different versions of a dependency in different environments, that way lies madness.
I don't know about pipfile.lock nor pip-compile, once I tested poetry I stuck to it without rethinking it (because I have better things to do than test all the package managers out there).
I'm personally afraid that the python community has bifurcated so far that a single 'correct' solution is doomed to fail. The needs of people writing and deploying HTTP/REST servers is just so different from the people writing PyTorch models and numerical simulations that no tool will ever satisfy the very different needs of both camps. The worst part is that many people developing these tools don't seem to realise this and blindly claim that their tool is The Tool! without having any deep insights into the needs of the other camps.
> The needs of people writing and deploying HTTP/REST servers is just so different from the people writing PyTorch models and numerical simulations that no tool will ever satisfy the very different needs of both camps.
Is this true though? At the end of it they both need python packages and some system dependencies installed (let's ignore models and data for now).
Why is there umpteen tools for doing this in python when there isn't in other languages? I have to deploy both web apps and ML models, the first thing I do is convert any project to use pyproject and poetry.
Whilst Poetry comes with it's own issues I've not had a project yet that this doesn't work for and wish the wider community would just settle on one method. Instead we have stuff coming in with various conda incantations, pip, pipx, poetry, setuptools, setup.py!
Deployment of ML models needs a decent solution too, half the ML code I get goes and fetches stuff from NLTK or Huggingface at runtime. Some of it (like the LLAMA models) needs various API keys set and EULAs agreed to before it'll run, then pings back to Huggingface each time you run it to check the EULA again! This makes life difficult when trying to deploy and adds this massive dependency on 3rd party services.
But how are you handling those system dependencies (is R and its various packages one?), and how do you know that the packages you install/manage via the python ecosystem work with them?
I think the reason Python has so many things is simply because most other languages throw their arms up and say "system stuff, not my problem" (rust is an excellent example of this, the build.rs is basically the same thing as a setup.py, except less standard, and currently lacks from what I've seen is any kind of systematic solution like cibuildwheel), whereas Python has always been trying to do something to address it.
The biggest problem about Pythons dependency managment in 2024 is that it stil feels like an afterthought. It is just not straightforward and it is not pythonic. I would go as far as saying that dependency managment with Python is likely more complicated than anything you would normally encounter within the language.
As of 2024 poetry is the best solution we have, but even it can come to its limits at times. I work in a position where I develope with poetry and have to deploy without it (using venv), and I do not wish the journey of learning how to do that on anybody.
But if you just mean that you want to gather the dependencies for a platform other than your build host: this should be possible with the help of Poetry and PDM since they both perform cross-platform resolution.
Coming from Java/Maven I was amazed to see the obscure mess in Python world regarding dependency management. After trying all available tools I finally settled on pdm. I also found it to be more intuitive than poetry.
The author mentions using pip freeze or pip-tools pip-compile as a solution to the indirect dependencies which are reliant on the Python environment, i.e. the platform and Python version.
But from what I understand pip cross-environment usage needs the requirements.txt file to be generated on the environment it is going to be run on. The solution of copying in the same requirements text for installing the packages locally might not work in the container.
you would have to maintain two requirements files, and automate their concatenation. It's messy. With a versionless requirements.txt file, there is no guarantee that your project will work on pip install next year.
You use bundler to manage what dependencies are installed and use that to get a lock file. You also create a .ruby-version file that specify the language version. There are a whole bunch of different tools to select what ruby version, but all works with that file and what you chose really doesn't matter.
I like pixi, but I am not likely to make the switch. They don't support pyproject.toml and other standards. This disqualifies it from being a potential "recommended tool" by PyPA or whatever.
Have you compared it with poetry or pip-tools? I'm thinking of trying pixi but still can't muster up energy to do it. Especially since for my use case poetry and pip-tools cover most of it.
With the lock file being default, we don't worry about different installs in different envs.
Having come from the Rails world, Bundler system was solved for... a decade? So I was surprised it was such a mess in Python until so recently.
At the core, the thing that makes Poetry and Bundler system so predictable is 1, lock file. 2, ability to install different versions in diff locations and referencing the version you need to load. Each alone isn't enough.
npm had the same problem pip suffered from, you may have a version installed different what the req.txt, proj.js or even lockfile says but because it exists inside the location, it gets loaded. It wasn't until yarn2 did node_modules finally get moved out such that side-by-side versions wasn't awkward.
[EDIT]
If you're not using Poetry + Docker for deployment yet, I 100% recommend it as the "boring" method.
[0] https://packaging.python.org/en/latest/tutorials/packaging-p...