Packaging a Python Codebase
So far, we've been working inside notebooks, which has allowed us to train a model very quickly. However, notebooks are not easy to put into production and we don't always have control over the environment (ex. Google Colab updates its packages periodically). When we used our notebook, we had a preloaded set of packages (run
!pip list inside the notebook to see all of them). But now we want to explicitly define our environment so we can reproduce it locally (for us and team members) and when we deploy to production. There are many recommended tools for when it comes to packaging in Python and we'll be using the tried and tested Pip.
There are many alternative dependency management and packaging tools, such as Poetry, but there are still many things in flux with these newer options. We're going to stick with Pip because it works for our application and don't want to deal with issues like long resolve periods.
Before we can start packaging, we need a way to create files and run commands. We can do this via the terminal, which will allow us to run languages such as bash, zsh, etc. to execute commands. All the commands we run should be the same regardless of your operating system or command-line interface (CLI) programming language.
While we'll organize our code from our notebook to scripts in the next lesson, we'll create the main project directory now so that we can save our packaging components there. We'll call our main project directory
mlops but feel free to name it anything you'd like.
# Create and change into the directory mkdir mlops cd mlops
First thing we'll do is set up the correct version of Python. We'll be using version
3.7.13 specifically but any version of Python 3 should work. Though you could download different Python versions online, we highly recommend using a version manager such as pyenv.
Pyenv works for Mac & Linux, but if you're on windows, we recommend using pyenv-win.
# Install pyenv $ brew install pyenv # Check version of python $ python --version Python 3.6.9 # Check available versions $ pyenv versions system * 3.6.9 # Install new version $ pyenv install 3.7.13 ---> 100% # Set new version $ pyenv local 3.7.13 system 3.6.9 * 3.7.13 # Validate $ python --version Python 3.7.13
We highly recommend using Python
3.7.13because, while using another version of Python will work, we may face some conflicts with certain package versions that may need to be resolved.
Next, we'll set up a virtual environment so we can isolate the required packages for our application. This will also keep components separated from other projects which may have different dependencies. Once we create our virtual environment, we'll activate it and install our required packages.
python3 -m venv venv source venv/bin/activate python3 -m pip install pip setuptools wheel
Let's unpack what's happening here:
- Creating a Python virtual environment named
- Activate our virtual environment. Type
deactivateto exit the virtual environment.
- Upgrading required packages so we download the latest package wheels.
Our virtual environment directory
venv should be visible when we list the directories in our project:
mlops/ ├── venv/ ├── requirements.txt └── setup.py
We'll know our virtual environment is active because we will it's name on the terminal. We can further validate by making sure
pip freeze returns nothing.
(venv) ➜ mlops: pip freeze
We'll create a separate file called
requirements.txt where we'll specify the packages (with their versions) that we want to install. While we could place these requirements directly inside
setup.py, many applications still look for a separate
We should be adding packages with their versions to our
requirements.txt as we require them for our project. It's inadvisable to install all packages and then do
pip freeze > requirements.txt because it dumps the dependencies of all our packages into the file (even the ones we didn't explicitly install). To mitigate this, there are tools such as pipreqs, pip-tools, pipchill, etc. that will only list the packages that are not dependencies. However, they're dependency resolving is not always accurate and don't work when you want to separate packages for different tasks (developing, testing, etc.).
If we experience conflicts between package versions, we can relax constraints by specifying that the package needs to be above a certain version, as opposed to the exact version. We could also specify no version for all packages and allow pip to resolve all conflicts. And then we can see which version were actually installed and add that information to our
# requirements.txt <PACKAGE>==<VERSION> # exact version <PACKAGE>==<VERSION> # above version <PACKAGE> # no version
Let's create a file called
setup.py to provide instructions on how to set up our virtual environment.
1 2 3
We'll start by extracting the require packaged from
1 2 3 4
The heart of the
setup.py file is the
setup object which describes how to set up our package and it's dependencies. Our package will be called
tagifai and it will encompass all the requirements needed to run it. The first several lines cover metadata (name, description, etc.) and then we define the requirements. Here we're stating that we require a Python version equal to or above 3.7 and then passing in our required packages to
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
We don't have any packages defined in our
requirements.txt file but if we did, we can use the
setup.py file, we can now install our packages like so:
python3 -m pip install -e . # installs required packages only
Obtaining file:///Users/goku/Documents/madewithml/mlops Preparing metadata (setup.py) ... done Installing collected packages: tagifai Running setup.py develop for tagifai Successfully installed tagifai-0.1
--editableflag installs a project in develop mode so we can make changes without having to reinstall packages.
Now if we do
pip freeze we should see that
tagifai is installed.
# Editable install with no version control (tagifai==0.1) -e /Users/goku/Documents/madewithml/mlops
and we should also see a
tagifai.egg-info directory in our project directory:
mlops/ ├── tagifai.egg-info/ ├── venv/ ├── requirements.txt └── setup.py
To cite this content, please use:
1 2 3 4 5 6