cedaei.com/content/posts/python-poetry-pipx.md

9.5 KiB

+++ title = "Improving Python Dependency Management With pipx and Poetry" date = "2021-09-19" author = "Ceda EI" tags = ["python", "development"] keywords = ["python", "development"] description = "My current dev setup with python, poetry and pipx" showFullContent = false +++

Over time, how I develop applications in python has changed noticeably. I will divide the topic into three sections and see how they tie into each other at the end.

  • Development
  • Packaging
  • Usage

Development

Under development, the issues I will focus on are the following:

  • Dependency Management
  • Virtualenvs and managing them

Historically, the way to do dependency management was through requirements.txt. I found requirements.txt hard to manage. In that setup, adding a dependency and installing it was two steps:

  • Add the package bar to requirements.txt
  • Either do pip install bar or pip install -r requirements.txt

While focused on development, I would often forget one or both of these steps. Also, the lack of a lock file was a small downside for me (could be a much larger downside for others). The separation between pip and requirements.txt can also easily lead you to accidentally depend on packages installed on your system or in your virtualenv but not specified in your requirements.txt.

Managing virtualenvs was also difficult. As a virtualenv and a project are not related, you need a directory structure. Otherwise, you can't tell which virtualenv is being used for which project. You can use the same virtualenvs for multiple projects, but that partially defeats the point of virtualenvs and makes requirements.txt more error-prone (higher chances of forgetting to add packages to it). The approach generally used is one of the following two:

foo/
├── foo_src/
└── foo_venv/

or

foo_src/
└── venv/

I preferred the second one as the first one nests the source code one directory deeper.

A new standard - pyproject.toml

In PEP-518, python standardized the pyproject.toml file which allows users to choose alternate build systems for package generation.

One such project that provides an alternate build system is Poetry. Poetry hits the nail on the head and solves my major gripes with traditional tooling.

Poetry and virtualenvs

Poetry manages the virtualenvs automatically and keeps track of which project uses which virtualenv automatically. Working on an existing project which uses poetry is as simple as this:

$ git clone https://gitlab.com/ceda_ei/verlauf
$ poetry install

The poetry install command sets up the virtualenv, install all the required dependencies inside that, and sets up any commands accordingly (I will get to this soon). To activate the virtualenv, simply run:

. "$(poetry env info --path)/bin/activate"

I wrap this in a small function which lets me toggle it quickly:

function poet() {
	POET_MANUAL=1
	if [[ -v VIRTUAL_ENV ]]; then
		deactivate
	else
		. "$(poetry env info --path)/bin/activate"
	fi
}

Running poet activates the virtualenv if it is not active and deactivates it if it is active. To make things even easier, I automatically activate and deactivate the virtualenv as I enter and leave the project directory. To do so, simply drop this in your .bashrc.

function find_in_parent() {
	local path
	IFS="/" read -ra path <<<"$PWD"
	for ((i=${#path[@]}; i > 0; i--)); do
		local current_path=""
		for ((j=1; j<i; j++)); do
			current_path="$current_path/${path[j]}"
		done
		if [[ -e "${current_path}/$1" ]]; then
			echo "${current_path}/"
			return
		fi
	done
	return 1
}

function auto_poet() {
	ret="$?"
	if [[ -v POET_MANUAL ]]; then
		return $ret
	fi
	if find_in_parent pyproject.toml &> /dev/null; then
		if [[ ! -v VIRTUAL_ENV ]]; then
		    if BASE="$(poetry env info --path)"; then
			. "$BASE/bin/activate"
			PS1=""
		    else
			POET_MANUAL=1
		    fi
		fi
	elif [[ -v VIRTUAL_ENV ]]; then
		deactivate
	fi
	return $ret
}

PROMPT_COMMAND="auto_poet;$PROMPT_COMMAND"

This ties in well with the poet function; if you use poet anytime in a bash session, activation switches from automatic to manual and changing directories no longer auto-toggles the virtualenv.

auto_poet and poet in action

Poetry and dependency management

Instead of using requirements.txt, poetry stores the dependencies inside pyproject.toml. Poetry is more strict compared to pip in resolving versioning issues. Dependencies and dev-dependencies are stored inside tool.poetry.dependencies and tool.poetry.dev-dependencies respectively. Here is an example of a pyproject.toml for a project I am working on.

[tool.poetry]
name = "bells"
version = "0.3.0"
description = "Bells is a program for keeping track of sound recordings."
authors = ["Ceda EI <ceda_ei@webionite.com>"]
license = "GPL-3.0"
readme = "README.md"
homepage = "https://gitlab.com/ceda_ei/bells.git"
repository = "https://gitlab.com/ceda_ei/bells.git"

[tool.poetry.dependencies]
python = ">=3.7,<3.11"
click = "^8.0.1"
questionary = "^1.10.0"
sounddevice = "^0.4.2"
SoundFile = "^0.10.3"
numpy = "^1.21.2"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

# I will talk about this section soon
[tool.poetry.scripts]
bells = "bells.__main__:main"

One of the upsides of poetry is that you don't have to manage the dependencies in pyproject.toml file yourself. Poetry adds an npm-like interface for adding and removing dependencies. To add a dependency to your project, simply run poetry add bar and it will add it to your pyproject.toml file and install it in the virtualenv as well. To remove a dependency, just run poetry remove bar. For development dependencies, just add the --dev flag to the commands.

Packaging

Since poetry replaces the build system, we can now configure the build using poetry via pyproject.toml. Inside pyproject.toml, the tool.poetry section stores all the build info needed; tool.poetry contains the metadata, tool.poetry.dependencies contains the dependencies, tool.poetry.source contains private repository details (in case, you don't want to use PyPi).

One of the options is tool.poetry.scripts. It contains scripts that the project exposes. This replaces console_scripts in entry_points of setuptools.

For example,

[tool.poetry.scripts]
foobar = "foo.bar:main"

This will add a script named foobar in your PATH. Running that is equivalent to running the following script

from foo.bar import main

if __name__ == "__main__":
    main()

For further details, check the reference.

Poetry also removes the need for manually doing editable installs (pip install -e .). The package is automatically installed as editable when you run poetry install. Any scripts specified in tool.poetry.scripts are automatically available in your PATH when you activate the venv.1

To build the package, simply run poetry build. This will generate a wheel and a tarball in the dist folder.

To publish the package to PyPi (or another repo), simply run poetry publish. You can combine the build and publish into one command with poetry publish --build.

example of poetry build

Usage

This part is more user-facing rather than dev-facing. If you want to use two packages globally that expose some scripts to the user, (e.g. awscli, youtube-dl, etc.) the general approach to do so is to run something like pip install --user youtube-dl. This install the package at the user level and exposes the script through ~/.local/bin/youtube-dl. However, this installs all the packages at the same user level. Hypothetically, if you have two packages foo and bar which have conflicting dependencies, this causes an issue. If you run,

$ pip install foo
$ pip install bar
$ bar # works
$ foo # breaks because of dependency mismatch

While installing bar, pip will install the dependencies for bar which will break foo after warning you2.

To solve this, there is pipx. Pipx installs each package in a separate virtualenv without requiring the user to activate said virtualenv before using the package.3

In the same scenario as before, doing the following works just fine.

$ pipx install foo
$ pipx install bar
$ bar # works
$ foo # also works

In this scenario, both bar and foo are installed in separate virtualenvs so the dependency conflict doesn't matter.

Some more things from my bashrc


function wrapper_no_poet() {
	local last_env
	if [[ -v VIRTUAL_ENV ]]; then
		last_env="$VIRTUAL_ENV"
		deactivate
	fi
	"$@"
	ret=$?
	if [[ -v last_env ]]; then
		. "$last_env/bin/activate"
	fi
	return $ret
}

alias wnp='wrapper_no_poet'
alias pm='POET_MANUAL=1'

Prefixing any command with wnp runs it outside the virtualenv if a virtualenv is active. Running pm turns off automatic virtualenv activation.


  1. This also allows for a nice switch between the development and production versions of the app. Essentially, when the virtualenv is active, you are using the development script while when it is deactivated, you are using the global (likely production) version. ↩︎

  2. To be precise, it will warn you that it broke foo but will still continue with the installation ↩︎

  3. For development, poetry also provides poetry run which runs a file without having to activate the virtualenv. ↩︎