Skip to content

Packaging a Python Codebase


Using configurations and virtual environments to create a setting for reproducing results.
Goku Mohandas
Goku Mohandas
· ·
Repository

📬  Receive new lessons straight to your inbox (once a month) and join 35K+ developers in learning how to responsibly deliver value with ML.

Intuition

So far, we've been working inside notebooks, which has allowed us to train a model very quickly. However, notebooks are not easy to put into production and we don't always have control over the environment (ex. Google Colab updates its packages periodically). When we used our notebook, we had a preloaded set of packages (run !pip list inside the notebook to see all of them). But now we want to explicitly define our environment so we can reproduce it locally (for us and team members) and when we deploy to production. There are many recommended tools for when it comes to packaging in Python and we'll be using the tried and tested Pip.

There are many alternative dependency management and packaging tools, such as Poetry, but there are still many things in flux with these newer options. We're going to stick with Pip because it works for our application and don't want to deal with issues like long resolve periods.

Terminal

Before we can start packaging, we need a way to create files and run commands. We can do this via the terminal, which will allow us to run languages such as bash, zsh, etc. to execute commands. All the commands we run should be the same regardless of your operating system or command-line interface (CLI) programming language.

Tip

We highly recommend you use iTerm2 (Mac) or ConEmu (Windows) in lieu of the default terminal for its rich features.

Project

While we'll organize our code from our notebook to scripts in the next lesson, we'll create the main project directory now so that we can save our packaging components there. We'll call our main project directory mlops but feel free to name it anything you'd like.

# Create and change into the directory
mkdir mlops
cd mlops

Python

First thing we'll do is set up the correct version of Python. We'll be using version 3.7.13 specifically but any version of Python 3 should work. Though you could download different Python versions online, we highly recommend using a version manager such as pyenv.

Pyenv works for Mac & Linux, but if you're on windows, we recommend using pyenv-win.

# Install pyenv
$ brew install pyenv

# Check version of python
$ python --version
Python 3.6.9

# Check available versions
$ pyenv versions
system
*  3.6.9

# Install new version
$ pyenv install 3.7.13
---> 100%

# Set new version
$ pyenv local 3.7.13
system
3.6.9
* 3.7.13

# Validate
$ python --version
Python 3.7.13

We highly recommend using Python 3.7.13 because, while using another version of Python will work, we may face some conflicts with certain package versions that may need to be resolved.

Virtual environment

Next, we'll set up a virtual environment so we can isolate the required packages for our application. This will also keep components separated from other projects which may have different dependencies. Once we create our virtual environment, we'll activate it and install our required packages.

python3 -m venv venv
source venv/bin/activate
python3 -m pip install pip setuptools wheel

Let's unpack what's happening here:

  1. Creating a Python virtual environment named venv.
  2. Activate our virtual environment. Type deactivate to exit the virtual environment.
  3. Upgrading required packages so we download the latest package wheels.

Our virtual environment directory venv should be visible when we list the directories in our project:

ls
mlops/
├── venv/
├── requirements.txt
└── setup.py

We'll know our virtual environment is active because we will it's name on the terminal. We can further validate by making sure pip freeze returns nothing.

(venv) ➜  mlops: pip freeze

Requirements

We'll create a separate file called requirements.txt where we'll specify the packages (with their versions) that we want to install. While we could place these requirements directly inside setup.py, many applications still look for a separate requirements.txt.

touch requirements.txt

We should be adding packages with their versions to our requirements.txt as we require them for our project. It's inadvisable to install all packages and then do pip freeze > requirements.txt because it dumps the dependencies of all our packages into the file (even the ones we didn't explicitly install). To mitigate this, there are tools such as pipreqs, pip-tools, pipchill, etc. that will only list the packages that are not dependencies. However, they're dependency resolving is not always accurate and don't work when you want to separate packages for different tasks (developing, testing, etc.).

Tip

If we experience conflicts between package versions, we can relax constraints by specifying that the package needs to be above a certain version, as opposed to the exact version. We could also specify no version for all packages and allow pip to resolve all conflicts. And then we can see which version were actually installed and add that information to our requirements.txt file.

# requirements.txt
<PACKAGE>==<VERSION>  # exact version
<PACKAGE>==<VERSION>  # above version
<PACKAGE>             # no version

Setup

Let's create a file called setup.py to provide instructions on how to set up our virtual environment.

touch setup.py
1
2
3
# setup.py
from pathlib import Path
from setuptools import find_namespace_packages, setup

We'll start by extracting the require packaged from requirements.txt:

1
2
3
4
# Load packages from requirements.txt
BASE_DIR = Path(__file__).parent
with open(Path(BASE_DIR, "requirements.txt"), "r") as file:
    required_packages = [ln.strip() for ln in file.readlines()]

The heart of the setup.py file is the setup object which describes how to set up our package and it's dependencies. Our package will be called tagifai and it will encompass all the requirements needed to run it. The first several lines cover metadata (name, description, etc.) and then we define the requirements. Here we're stating that we require a Python version equal to or above 3.7 and then passing in our required packages to install_requires.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# setup.py
setup(
    name="tagifai",
    version=0.1,
    description="Classify machine learning projects.",
    author="Goku Mohandas",
    author_email="[email protected]",
    url="https://madewithml.com/",
    python_requires=">=3.7",
    install_requires=[required_packages],
)
View setup.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from pathlib import Path
from setuptools import find_namespace_packages, setup

# Load packages from requirements.txt
BASE_DIR = Path(__file__).parent
with open(Path(BASE_DIR, "requirements.txt"), "r") as file:
    required_packages = [ln.strip() for ln in file.readlines()]

# Define our package
setup(
    name="tagifai",
    version=0.1,
    description="Classify machine learning projects.",
    author="Goku Mohandas",
    author_email="[email protected]",
    url="https://madewithml.com/",
    python_requires=">=3.7",
    packages=find_namespace_packages(),
    install_requires=[required_packages],
)

Usage

We don't have any packages defined in our requirements.txt file but if we did, we can use the setup.py file, we can now install our packages like so:

python3 -m pip install -e .            # installs required packages only
Obtaining file:///Users/goku/Documents/madewithml/mlops
  Preparing metadata (setup.py) ... done
Installing collected packages: tagifai
  Running setup.py develop for tagifai
Successfully installed tagifai-0.1

The -e or --editable flag installs a project in develop mode so we can make changes without having to reinstall packages.

Now if we do pip freeze we should see that tagifai is installed.

pip freeze
# Editable install with no version control (tagifai==0.1)
-e /Users/goku/Documents/madewithml/mlops

and we should also see a tagifai.egg-info directory in our project directory:

mlops/
├── tagifai.egg-info/
├── venv/
├── requirements.txt
└── setup.py

There are many alternatives to a setup.py file such as the setup.cfg and the more recent pyproject.toml.


To cite this lesson, please use:

1
2
3
4
5
6
@article{madewithml,
    author       = {Goku Mohandas},
    title        = { Packaging - Made With ML },
    howpublished = {\url{https://madewithml.com/}},
    year         = {2021}
}