+++ title = "Improving Python Dependency Management With pipx and Poetry" date = "2021-09-19" author = "Ceda EI" tags = ["python", "development"] keywords = ["python", "development"] description = "My current dev setup with python, poetry and pipx" showFullContent = false +++ Over time, how I develop applications in python has changed noticeably. I will divide the topic into three sections and see how they tie into each other at the end. - Development - Packaging - Usage ## Development Under development, the issues I will focus on are the following: - Dependency Management - Virtualenvs and managing them Historically, the way to do dependency management was through `requirements.txt`. I found `requirements.txt` hard to manage. In that setup, adding a dependency and installing it was two steps: - Add the package `bar` to `requirements.txt` - Either do `pip install bar` or `pip install -r requirements.txt` While focused on development, I would often forget one or both of these steps. Also, the lack of a lock file was a small downside for me (could be a much larger downside for others). The separation between `pip` and `requirements.txt` can also easily lead you to accidentally depend on packages installed on your system or in your virtualenv but not specified in your `requirements.txt`. Managing virtualenvs was also difficult. As a virtualenv and a project are not related, you need a directory structure. Otherwise, you can't tell which virtualenv is being used for which project. You can use the same virtualenvs for multiple projects, but that partially defeats the point of virtualenvs and makes `requirements.txt` more error-prone (higher chances of forgetting to add packages to it). The approach generally used is one of the following two: ``` foo/ ├── foo_src/ └── foo_venv/ ``` or ``` foo_src/ └── venv/ ``` I preferred the second one as the first one nests the source code one directory deeper. ### A new standard - `pyproject.toml` In [PEP-518](https://www.python.org/dev/peps/pep-0518/), python standardized the `pyproject.toml` file which allows users to choose alternate build systems for package generation. One such project that provides an alternate build system is [Poetry](https://python-poetry.org/). Poetry hits the nail on the head and solves my major gripes with traditional tooling. ### Poetry and virtualenvs Poetry manages the virtualenvs automatically and keeps track of which project uses which virtualenv automatically. Working on an existing project which uses poetry is as simple as this: ```bash $ git clone https://gitlab.com/ceda_ei/verlauf $ poetry install ``` The `poetry install` command sets up the virtualenv, install all the required dependencies inside that, and sets up any commands accordingly (I will get to this soon). To activate the virtualenv, simply run: ```bash . "$(poetry env info --path)/bin/activate" ``` I wrap this in a small function which lets me toggle it quickly: ```bash function poet() { POET_MANUAL=1 if [[ -v VIRTUAL_ENV ]]; then deactivate else . "$(poetry env info --path)/bin/activate" fi } ``` Running `poet` activates the virtualenv if it is not active and deactivates it if it is active. To make things even easier, I automatically activate and deactivate the virtualenv as I enter and leave the project directory. To do so, simply drop this in your `.bashrc`. ```bash function find_in_parent() { local path IFS="/" read -ra path <<<"$PWD" for ((i=${#path[@]}; i > 0; i--)); do local current_path="" for ((j=1; j /dev/null; then if [[ ! -v VIRTUAL_ENV ]]; then if BASE="$(poetry env info --path)"; then . "$BASE/bin/activate" PS1="" else POET_MANUAL=1 fi fi elif [[ -v VIRTUAL_ENV ]]; then deactivate fi return $ret } PROMPT_COMMAND="auto_poet;$PROMPT_COMMAND" ``` This ties in well with the `poet` function; if you use `poet` anytime in a bash session, activation switches from automatic to manual and changing directories no longer auto-toggles the virtualenv. ![auto_poet and poet in action](/images/auto_poet.webp) ### Poetry and dependency management Instead of using `requirements.txt`, poetry stores the dependencies inside `pyproject.toml`. Poetry is more strict compared to `pip` in resolving versioning issues. Dependencies and dev-dependencies are stored inside `tool.poetry.dependencies` and `tool.poetry.dev-dependencies` respectively. Here is an example of a `pyproject.toml` for a project I am working on. ```toml [tool.poetry] name = "bells" version = "0.3.0" description = "Bells is a program for keeping track of sound recordings." authors = ["Ceda EI "] license = "GPL-3.0" readme = "README.md" homepage = "https://gitlab.com/ceda_ei/bells.git" repository = "https://gitlab.com/ceda_ei/bells.git" [tool.poetry.dependencies] python = ">=3.7,<3.11" click = "^8.0.1" questionary = "^1.10.0" sounddevice = "^0.4.2" SoundFile = "^0.10.3" numpy = "^1.21.2" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api" # I will talk about this section soon [tool.poetry.scripts] bells = "bells.__main__:main" ``` One of the upsides of poetry is that you don't have to manage the dependencies in `pyproject.toml` file yourself. Poetry adds an `npm`-like interface for adding and removing dependencies. To add a dependency to your project, simply run `poetry add bar` and it will add it to your `pyproject.toml` file and install it in the virtualenv as well. To remove a dependency, just run `poetry remove bar`. For development dependencies, just add the `--dev` flag to the commands. ## Packaging Since poetry replaces the build system, we can now configure the build using poetry via `pyproject.toml`. Inside `pyproject.toml`, the `tool.poetry` section stores all the build info needed; `tool.poetry` contains the metadata, `tool.poetry.dependencies` contains the dependencies, `tool.poetry.source` contains private repository details (in case, you don't want to use PyPi). One of the options is `tool.poetry.scripts`. It contains scripts that the project exposes. This replaces `console_scripts` in `entry_points` of `setuptools`. For example, ```toml [tool.poetry.scripts] foobar = "foo.bar:main" ``` This will add a script named `foobar` in your `PATH`. Running that is equivalent to running the following script ```python from foo.bar import main if __name__ == "__main__": main() ``` For further details, check the [reference](https://python-poetry.org/docs/pyproject/). Poetry also removes the need for manually doing editable installs (`pip install -e .`). The package is automatically installed as editable when you run `poetry install`. Any scripts specified in `tool.poetry.scripts` are automatically available in your `PATH` when you activate the `venv`.[^1] To build the package, simply run `poetry build`. This will generate a wheel and a tarball in the dist folder. To publish the package to PyPi (or another repo), simply run `poetry publish`. You can combine the build and publish into one command with `poetry publish --build`. ![example of poetry build](/images/poetry_build.webp) ## Usage This part is more user-facing rather than dev-facing. If you want to use two packages globally that expose some scripts to the user, (e.g. `awscli`, `youtube-dl`, etc.) the general approach to do so is to run something like `pip install --user youtube-dl`. This install the package at the user level and exposes the script through `~/.local/bin/youtube-dl`. However, this installs all the packages at the same user level. Hypothetically, if you have two packages `foo` and `bar` which have conflicting dependencies, this causes an issue. If you run, ```bash $ pip install foo $ pip install bar $ bar # works $ foo # breaks because of dependency mismatch ``` While installing `bar`, `pip` will install the dependencies for `bar` which will break `foo` after warning you[^2]. To solve this, there is [`pipx`](https://github.com/pypa/pipx). Pipx installs each package in a separate virtualenv without requiring the user to activate said virtualenv before using the package.[^3] In the same scenario as before, doing the following works just fine. ```bash $ pipx install foo $ pipx install bar $ bar # works $ foo # also works ``` In this scenario, both `bar` and `foo` are installed in separate virtualenvs so the dependency conflict doesn't matter. ## Some more things from my bashrc ```bash function wrapper_no_poet() { local last_env if [[ -v VIRTUAL_ENV ]]; then last_env="$VIRTUAL_ENV" deactivate fi "$@" ret=$? if [[ -v last_env ]]; then . "$last_env/bin/activate" fi return $ret } alias wnp='wrapper_no_poet' alias pm='POET_MANUAL=1' ``` Prefixing any command with `wnp` runs it outside the virtualenv if a virtualenv is active. Running `pm` turns off automatic virtualenv activation. [^1]: This also allows for a nice switch between the development and production versions of the app. Essentially, when the virtualenv is active, you are using the development script while when it is deactivated, you are using the global (likely production) version. [^2]: To be precise, it will warn you that it broke `foo` but will still continue with the installation [^3]: For development, poetry also provides `poetry run` which runs a file without having to activate the virtualenv.