cedaei.com/content/posts/python-poetry-pipx.md

322 lines
9.5 KiB
Markdown
Raw Permalink Normal View History

+++
title = "Improving Python Dependency Management With pipx and Poetry"
date = "2021-09-19"
author = "Ceda EI"
tags = ["python", "development"]
keywords = ["python", "development"]
description = "My current dev setup with python, poetry and pipx"
showFullContent = false
+++
Over time, how I develop applications in python has changed noticeably. I will
divide the topic into three sections and see how they tie into each other at
the end.
- Development
- Packaging
- Usage
## Development
Under development, the issues I will focus on are the following:
- Dependency Management
- Virtualenvs and managing them
Historically, the way to do dependency management was through
`requirements.txt`. I found `requirements.txt` hard to manage. In that setup,
adding a dependency and installing it was two steps:
- Add the package `bar` to `requirements.txt`
- Either do `pip install bar` or `pip install -r requirements.txt`
While focused on development, I would often forget one or both of these steps.
Also, the lack of a lock file was a small downside for me (could be a much
larger downside for others). The separation between `pip` and
`requirements.txt` can also easily lead you to accidentally depend on packages
installed on your system or in your virtualenv but not specified in your
`requirements.txt`.
Managing virtualenvs was also difficult. As a virtualenv and a project are not
related, you need a directory structure. Otherwise, you can't tell which
virtualenv is being used for which project. You can use the same virtualenvs
for multiple projects, but that partially defeats the point of virtualenvs and
makes `requirements.txt` more error-prone (higher chances of forgetting to add
packages to it). The approach generally used is one of the following two:
```
foo/
├── foo_src/
└── foo_venv/
```
or
```
foo_src/
└── venv/
```
I preferred the second one as the first one nests the source code one
directory deeper.
### A new standard - `pyproject.toml`
In [PEP-518](https://www.python.org/dev/peps/pep-0518/), python standardized
the `pyproject.toml` file which allows users to choose alternate build systems
for package generation.
One such project that provides an alternate build system is
[Poetry](https://python-poetry.org/). Poetry hits the nail on the head and
solves my major gripes with traditional tooling.
### Poetry and virtualenvs
Poetry manages the virtualenvs automatically and keeps track of which project
uses which virtualenv automatically. Working on an existing project which uses
poetry is as simple as this:
```bash
$ git clone https://gitlab.com/ceda_ei/verlauf
$ poetry install
```
The `poetry install` command sets up the virtualenv, install all the required
dependencies inside that, and sets up any commands accordingly (I will get to
this soon). To activate the virtualenv, simply run:
```bash
. "$(poetry env info --path)/bin/activate"
```
I wrap this in a small function which lets me toggle it quickly:
```bash
function poet() {
POET_MANUAL=1
if [[ -v VIRTUAL_ENV ]]; then
deactivate
else
. "$(poetry env info --path)/bin/activate"
fi
}
```
Running `poet` activates the virtualenv if it is not active and deactivates it if
it is active. To make things even easier, I automatically activate and
deactivate the virtualenv as I enter and leave the project directory. To do
so, simply drop this in your `.bashrc`.
```bash
function find_in_parent() {
local path
IFS="/" read -ra path <<<"$PWD"
for ((i=${#path[@]}; i > 0; i--)); do
local current_path=""
for ((j=1; j<i; j++)); do
current_path="$current_path/${path[j]}"
done
if [[ -e "${current_path}/$1" ]]; then
echo "${current_path}/"
return
fi
done
return 1
}
function auto_poet() {
ret="$?"
if [[ -v POET_MANUAL ]]; then
return $ret
fi
if find_in_parent pyproject.toml &> /dev/null; then
if [[ ! -v VIRTUAL_ENV ]]; then
if BASE="$(poetry env info --path)"; then
. "$BASE/bin/activate"
PS1=""
else
POET_MANUAL=1
fi
fi
elif [[ -v VIRTUAL_ENV ]]; then
deactivate
fi
return $ret
}
PROMPT_COMMAND="auto_poet;$PROMPT_COMMAND"
```
This ties in well with the `poet` function; if you use `poet` anytime in a bash
session, activation switches from automatic to manual and changing directories
no longer auto-toggles the virtualenv.
![auto_poet and poet in action](/images/auto_poet.webp)
### Poetry and dependency management
Instead of using `requirements.txt`, poetry stores the dependencies inside
`pyproject.toml`. Poetry is more strict compared to `pip` in resolving
versioning issues. Dependencies and dev-dependencies are stored inside
`tool.poetry.dependencies` and `tool.poetry.dev-dependencies` respectively.
Here is an example of a `pyproject.toml` for a project I am working on.
```toml
[tool.poetry]
name = "bells"
version = "0.3.0"
description = "Bells is a program for keeping track of sound recordings."
authors = ["Ceda EI <ceda_ei@webionite.com>"]
license = "GPL-3.0"
readme = "README.md"
homepage = "https://gitlab.com/ceda_ei/bells.git"
repository = "https://gitlab.com/ceda_ei/bells.git"
[tool.poetry.dependencies]
python = ">=3.7,<3.11"
click = "^8.0.1"
questionary = "^1.10.0"
sounddevice = "^0.4.2"
SoundFile = "^0.10.3"
numpy = "^1.21.2"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
# I will talk about this section soon
[tool.poetry.scripts]
bells = "bells.__main__:main"
```
One of the upsides of poetry is that you don't have to manage the dependencies
in `pyproject.toml` file yourself. Poetry adds an `npm`-like interface for
adding and removing dependencies. To add a dependency to your project, simply
run `poetry add bar` and it will add it to your `pyproject.toml` file and
install it in the virtualenv as well. To remove a dependency, just run `poetry
remove bar`. For development dependencies, just add the `--dev` flag to the
commands.
## Packaging
Since poetry replaces the build system, we can now configure the build using
poetry via `pyproject.toml`. Inside `pyproject.toml`, the `tool.poetry` section
stores all the build info needed; `tool.poetry` contains the metadata,
`tool.poetry.dependencies` contains the dependencies, `tool.poetry.source`
contains private repository details (in case, you don't want to use PyPi).
One of the options is `tool.poetry.scripts`. It contains scripts that the
project exposes. This replaces `console_scripts` in `entry_points` of
`setuptools`.
For example,
```toml
[tool.poetry.scripts]
foobar = "foo.bar:main"
```
This will add a script named `foobar` in your `PATH`. Running that is
equivalent to running the following script
```python
from foo.bar import main
if __name__ == "__main__":
main()
```
For further details, check the
[reference](https://python-poetry.org/docs/pyproject/).
Poetry also removes the need for manually doing editable installs (`pip install
-e .`). The package is automatically installed as editable when you run
`poetry install`. Any scripts specified in `tool.poetry.scripts` are
automatically available in your `PATH` when you activate the `venv`.[^1]
To build the package, simply run `poetry build`. This will generate a wheel and
a tarball in the dist folder.
To publish the package to PyPi (or another repo), simply run `poetry publish`.
You can combine the build and publish into one command with `poetry publish
--build`.
![example of poetry build](/images/poetry_build.webp)
## Usage
This part is more user-facing rather than dev-facing. If you want to use two
packages globally that expose some scripts to the user, (e.g. `awscli`,
`youtube-dl`, etc.) the general approach to do so is to run something like `pip
install --user youtube-dl`. This install the package at the user level and
exposes the script through `~/.local/bin/youtube-dl`. However, this installs
all the packages at the same user level. Hypothetically, if you have two
packages `foo` and `bar` which have conflicting dependencies, this causes an
issue. If you run,
```bash
$ pip install foo
$ pip install bar
$ bar # works
$ foo # breaks because of dependency mismatch
```
While installing `bar`, `pip` will install the dependencies for `bar` which
will break `foo` after warning you[^2].
To solve this, there is [`pipx`](https://github.com/pypa/pipx). Pipx installs
each package in a separate virtualenv without requiring the user to activate
said virtualenv before using the package.[^3]
In the same scenario as before, doing the following works just fine.
```bash
$ pipx install foo
$ pipx install bar
$ bar # works
$ foo # also works
```
In this scenario, both `bar` and `foo` are installed in separate virtualenvs so
the dependency conflict doesn't matter.
## Some more things from my bashrc
```bash
function wrapper_no_poet() {
local last_env
if [[ -v VIRTUAL_ENV ]]; then
last_env="$VIRTUAL_ENV"
deactivate
fi
"$@"
ret=$?
if [[ -v last_env ]]; then
. "$last_env/bin/activate"
fi
return $ret
}
alias wnp='wrapper_no_poet'
alias pm='POET_MANUAL=1'
```
Prefixing any command with `wnp` runs it outside the virtualenv if a virtualenv
is active. Running `pm` turns off automatic virtualenv activation.
[^1]: This also allows for a nice switch between the development and production
versions of the app. Essentially, when the virtualenv is active, you are
using the development script while when it is deactivated, you are using
the global (likely production) version.
[^2]: To be precise, it will warn you that it broke `foo` but will still
continue with the installation
[^3]: For development, poetry also provides `poetry run` which runs a file
without having to activate the virtualenv.