diff --git a/content/posts/python-poetry-pipx.md b/content/posts/python-poetry-pipx.md new file mode 100644 index 0000000..996c1ab --- /dev/null +++ b/content/posts/python-poetry-pipx.md @@ -0,0 +1,321 @@ ++++ +title = "Improving Python Dependency Management With pipx and Poetry" +date = "2021-09-19" +author = "Ceda EI" +tags = ["python", "development"] +keywords = ["python", "development"] +description = "My current dev setup with python, poetry and pipx" +showFullContent = false ++++ + +Over time, how I develop applications in python has changed noticeably. I will +divide the topic into three sections and see how they tie into each other at +the end. + +- Development +- Packaging +- Usage + +## Development + +Under development, the issues I will focus on are the following: + +- Dependency Management +- Virtualenvs and managing them + +Historically, the way to do dependency management was through +`requirements.txt`. I found `requirements.txt` hard to manage. In that setup, +adding a dependency and installing it was two steps: + +- Add the package `bar` to `requirements.txt` +- Either do `pip install bar` or `pip install -r requirements.txt` + +While focused on development, I would often forget one or both of these steps. +Also, the lack of a lock file was a small downside for me (could be a much +larger downside for others). The separation between `pip` and +`requirements.txt` can also easily lead you to accidentally depend on packages +installed on your system or in your virtualenv but not specified in your +`requirements.txt`. + +Managing virtualenvs was also difficult. As a virtualenv and a project are not +related, you need a directory structure. Otherwise, you can't tell which +virtualenv is being used for which project. You can use the same virtualenvs +for multiple projects, but that partially defeats the point of virtualenvs and +makes `requirements.txt` more error-prone (higher chances of forgetting to add +packages to it). The approach generally used is one of the following two: + + +``` +foo/ +├── foo_src/ +└── foo_venv/ +``` + +or + +``` +foo_src/ +└── venv/ +``` + +I preferred the second one as the first one nests the source code one +directory deeper. + +### A new standard - `pyproject.toml` + +In [PEP-518](https://www.python.org/dev/peps/pep-0518/), python standardized +the `pyproject.toml` file which allows users to choose alternate build systems +for package generation. + +One such project that provides an alternate build system is +[Poetry](https://python-poetry.org/). Poetry hits the nail on the head and +solves my major gripes with traditional tooling. + +### Poetry and virtualenvs + +Poetry manages the virtualenvs automatically and keeps track of which project +uses which virtualenv automatically. Working on an existing project which uses +poetry is as simple as this: + +```bash +$ git clone https://gitlab.com/ceda_ei/verlauf +$ poetry install +``` + +The `poetry install` command sets up the virtualenv, install all the required +dependencies inside that, and sets up any commands accordingly (I will get to +this soon). To activate the virtualenv, simply run: + +```bash +. "$(poetry env info --path)/bin/activate" +``` + +I wrap this in a small function which lets me toggle it quickly: + +```bash +function poet() { + POET_MANUAL=1 + if [[ -v VIRTUAL_ENV ]]; then + deactivate + else + . "$(poetry env info --path)/bin/activate" + fi +} +``` + +Running `poet` activates the virtualenv if it is not active and deactivates it if +it is active. To make things even easier, I automatically activate and +deactivate the virtualenv as I enter and leave the project directory. To do +so, simply drop this in your `.bashrc`. + +```bash +function find_in_parent() { + local path + IFS="/" read -ra path <<<"$PWD" + for ((i=${#path[@]}; i > 0; i--)); do + local current_path="" + for ((j=1; j /dev/null; then + if [[ ! -v VIRTUAL_ENV ]]; then + if BASE="$(poetry env info --path)"; then + . "$BASE/bin/activate" + PS1="" + else + POET_MANUAL=1 + fi + fi + elif [[ -v VIRTUAL_ENV ]]; then + deactivate + fi + return $ret +} + +PROMPT_COMMAND="auto_poet;$PROMPT_COMMAND" +``` + +This ties in well with the `poet` function; if you use `poet` anytime in a bash +session, activation switches from automatic to manual and changing directories +no longer auto-toggles the virtualenv. + +![auto_poet and poet in action](/images/auto_poet.webp) + +### Poetry and dependency management + +Instead of using `requirements.txt`, poetry stores the dependencies inside +`pyproject.toml`. Poetry is more strict compared to `pip` in resolving +versioning issues. Dependencies and dev-dependencies are stored inside +`tool.poetry.dependencies` and `tool.poetry.dev-dependencies` respectively. +Here is an example of a `pyproject.toml` for a project I am working on. + +```toml +[tool.poetry] +name = "bells" +version = "0.3.0" +description = "Bells is a program for keeping track of sound recordings." +authors = ["Ceda EI "] +license = "GPL-3.0" +readme = "README.md" +homepage = "https://gitlab.com/ceda_ei/bells.git" +repository = "https://gitlab.com/ceda_ei/bells.git" + +[tool.poetry.dependencies] +python = ">=3.7,<3.11" +click = "^8.0.1" +questionary = "^1.10.0" +sounddevice = "^0.4.2" +SoundFile = "^0.10.3" +numpy = "^1.21.2" + +[tool.poetry.dev-dependencies] + +[build-system] +requires = ["poetry-core>=1.0.0"] +build-backend = "poetry.core.masonry.api" + +# I will talk about this section soon +[tool.poetry.scripts] +bells = "bells.__main__:main" +``` + +One of the upsides of poetry is that you don't have to manage the dependencies +in `pyproject.toml` file yourself. Poetry adds an `npm`-like interface for +adding and removing dependencies. To add a dependency to your project, simply +run `poetry add bar` and it will add it to your `pyproject.toml` file and +install it in the virtualenv as well. To remove a dependency, just run `poetry +remove bar`. For development dependencies, just add the `--dev` flag to the +commands. + +## Packaging + +Since poetry replaces the build system, we can now configure the build using +poetry via `pyproject.toml`. Inside `pyproject.toml`, the `tool.poetry` section +stores all the build info needed; `tool.poetry` contains the metadata, +`tool.poetry.dependencies` contains the dependencies, `tool.poetry.source` +contains private repository details (in case, you don't want to use PyPi). + +One of the options is `tool.poetry.scripts`. It contains scripts that the +project exposes. This replaces `console_scripts` in `entry_points` of +`setuptools`. + +For example, + +```toml +[tool.poetry.scripts] +foobar = "foo.bar:main" +``` + +This will add a script named `foobar` in your `PATH`. Running that is +equivalent to running the following script + +```python +from foo.bar import main + +if __name__ == "__main__": + main() +``` + +For further details, check the +[reference](https://python-poetry.org/docs/pyproject/). + +Poetry also removes the need for manually doing editable installs (`pip install +-e .`). The package is automatically installed as editable when you run +`poetry install`. Any scripts specified in `tool.poetry.scripts` are +automatically available in your `PATH` when you activate the `venv`.[^1] + +To build the package, simply run `poetry build`. This will generate a wheel and +a tarball in the dist folder. + +To publish the package to PyPi (or another repo), simply run `poetry publish`. +You can combine the build and publish into one command with `poetry publish +--build`. + +![example of poetry build](/images/poetry_build.webp) + +## Usage + +This part is more user-facing rather than dev-facing. If you want to use two +packages globally that expose some scripts to the user, (e.g. `awscli`, +`youtube-dl`, etc.) the general approach to do so is to run something like `pip +install --user youtube-dl`. This install the package at the user level and +exposes the script through `~/.local/bin/youtube-dl`. However, this installs +all the packages at the same user level. Hypothetically, if you have two +packages `foo` and `bar` which have conflicting dependencies, this causes an +issue. If you run, + +```bash +$ pip install foo +$ pip install bar +$ bar # works +$ foo # breaks because of dependency mismatch +``` + +While installing `bar`, `pip` will install the dependencies for `bar` which +will break `foo` after warning you[^2]. + +To solve this, there is [`pipx`](https://github.com/pypa/pipx). Pipx installs +each package in a separate virtualenv without requiring the user to activate +said virtualenv before using the package.[^3] + +In the same scenario as before, doing the following works just fine. + +```bash +$ pipx install foo +$ pipx install bar +$ bar # works +$ foo # also works +``` + +In this scenario, both `bar` and `foo` are installed in separate virtualenvs so +the dependency conflict doesn't matter. + +## Some more things from my bashrc + +```bash + +function wrapper_no_poet() { + local last_env + if [[ -v VIRTUAL_ENV ]]; then + last_env="$VIRTUAL_ENV" + deactivate + fi + "$@" + ret=$? + if [[ -v last_env ]]; then + . "$last_env/bin/activate" + fi + return $ret +} + +alias wnp='wrapper_no_poet' +alias pm='POET_MANUAL=1' +``` + +Prefixing any command with `wnp` runs it outside the virtualenv if a virtualenv +is active. Running `pm` turns off automatic virtualenv activation. + + +[^1]: This also allows for a nice switch between the development and production + versions of the app. Essentially, when the virtualenv is active, you are + using the development script while when it is deactivated, you are using + the global (likely production) version. + +[^2]: To be precise, it will warn you that it broke `foo` but will still + continue with the installation + +[^3]: For development, poetry also provides `poetry run` which runs a file + without having to activate the virtualenv. diff --git a/static/images/auto_poet.webp b/static/images/auto_poet.webp new file mode 100644 index 0000000..feca41a Binary files /dev/null and b/static/images/auto_poet.webp differ diff --git a/static/images/poetry_build.webp b/static/images/poetry_build.webp new file mode 100644 index 0000000..01945cc Binary files /dev/null and b/static/images/poetry_build.webp differ