Packaging a Python Codebase
📬 Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.
Intuition
So far, we've been working inside notebooks, which has allowed us to train a model very quickly. However, notebooks are not easy to put into production and we don't always have control over the environment (ex. Google Colab updates its packages periodically). When we used our notebook, we had a preloaded set of packages (run !pip list
inside the notebook to see all of them). But now we want to explicitly define our environment so we can reproduce it locally (for us and team members) and when we deploy to production. There are many recommended tools for when it comes to packaging in Python and we'll be using the tried and tested Pip.
There are many alternative dependency management and packaging tools, such as Poetry, but there are still many things in flux with these newer options. We're going to stick with Pip because it works for our application and don't want to deal with issues like long resolve periods.
Terminal
Before we can start packaging, we need a way to create files and run commands. We can do this via the terminal, which will allow us to run languages such as bash, zsh, etc. to execute commands. All the commands we run should be the same regardless of your operating system or command-line interface (CLI) programming language.
Tip
We highly recommend you use iTerm2 (Mac) or ConEmu (Windows) in lieu of the default terminal for its rich features.
Project
While we'll organize our code from our notebook to scripts in the next lesson, we'll create the main project directory now so that we can save our packaging components there. We'll call our main project directory mlops
but feel free to name it anything you'd like.
# Create and change into the directory
mkdir mlops
cd mlops
Python
First thing we'll do is set up the correct version of Python. We'll be using version 3.9.1
specifically but any version of Python 3 should work. Though you could download different Python versions online, we highly recommend using a version manager such as pyenv.
Pyenv works for Mac & Linux, but if you're on windows, we recommend using pyenv-win.
# Install pyenv
$ brew install pyenv
# Check version of python
$ python --version
Python 3.6.9
# Check available versions
$ pyenv versions
system
* 3.6.9
# Install new version
$ pyenv install 3.9.1
---> 100%
# Set new version
$ pyenv local 3.9.1
system
3.6.9
* 3.9.1
# Validate
$ python --version
Python 3.9.1
We highly recommend using Python
3.9.1
because, while using another version of Python will work, we may face some conflicts with certain package versions that may need to be resolved.
Virtual environment
Next, we'll set up a virtual environment so we can isolate the required packages for our application. This will also keep components separated from other projects which may have different dependencies. Once we create our virtual environment, we'll activate it and install our required packages.
python3 -m venv venv
source venv/bin/activate
python3 -m pip install pip setuptools wheel
Let's unpack what's happening here:
- Creating a Python virtual environment named
venv
. - Activate our virtual environment. Type
deactivate
to exit the virtual environment. - Upgrading required packages so we download the latest package wheels.
Our virtual environment directory venv
should be visible when we list the directories in our project:
ls
mlops/ ├── venv/ ├── requirements.txt └── setup.py
We'll know our virtual environment is active because we will it's name on the terminal. We can further validate by making sure pip freeze
returns nothing.
(venv) ➜ mlops: pip freeze
Requirements
We'll create a separate file called requirements.txt
where we'll specify the packages (with their versions) that we want to install. While we could place these requirements directly inside setup.py
, many applications still look for a separate requirements.txt
.
touch requirements.txt
We should be adding packages with their versions to our requirements.txt
as we require them for our project. It's inadvisable to install all packages and then do pip freeze > requirements.txt
because it dumps the dependencies of all our packages into the file (even the ones we didn't explicitly install). To mitigate this, there are tools such as pipreqs, pip-tools, pipchill, etc. that will only list the packages that are not dependencies. However, they're dependency resolving is not always accurate and don't work when you want to separate packages for different tasks (developing, testing, etc.).
Tip
If we experience conflicts between package versions, we can relax constraints by specifying that the package needs to be above a certain version, as opposed to the exact version. We could also specify no version for all packages and allow pip to resolve all conflicts. And then we can see which version were actually installed and add that information to our requirements.txt
file.
# requirements.txt
<PACKAGE>==<VERSION> # exact version
<PACKAGE>==<VERSION> # above version
<PACKAGE> # no version
Setup
Let's create a file called setup.py
to provide instructions on how to set up our virtual environment.
touch setup.py
1 2 3 |
|
We'll start by extracting the require packaged from requirements.txt
:
1 2 3 4 |
|
The heart of the setup.py
file is the setup
object which describes how to set up our package and it's dependencies. Our package will be called tagifai
and it will encompass all the requirements needed to run it. The first several lines cover metadata (name, description, etc.) and then we define the requirements. Here we're stating that we require a Python version equal to or above 3.7 and then passing in our required packages to install_requires
.
1 2 3 4 5 6 7 8 9 10 11 |
|
View setup.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Usage
We don't have any packages defined in our requirements.txt
file but if we did, we can use the setup.py
file, we can now install our packages like so:
python3 -m pip install -e . # installs required packages only
Obtaining file:///Users/goku/Documents/madewithml/mlops Preparing metadata (setup.py) ... done Installing collected packages: tagifai Running setup.py develop for tagifai Successfully installed tagifai-0.1
The
-e
or--editable
flag installs a project in develop mode so we can make changes without having to reinstall packages.
Now if we do pip freeze
we should see that tagifai
is installed.
pip freeze
# Editable install with no version control (tagifai==0.1) -e /Users/goku/Documents/madewithml/mlops
and we should also see a tagifai.egg-info
directory in our project directory:
mlops/ ├── tagifai.egg-info/ ├── venv/ ├── requirements.txt └── setup.py
There are many alternatives to a setup.py file such as the setup.cfg
and the more recent pyproject.toml.
To cite this content, please use:
1 2 3 4 5 6 |
|