Packaging a Python Codebase
All of the work that we're doing with Python script is available in the main repository, however, it's difficult to follow along this repository because all of the content we cover in this course is available there in one snapshot. So we highly recommend using the branches in the repository to follow along. Each branch's name will match with the lesson's name (ex. this lesson's branch is called packaging) and we can pull that branch to follow along the respective lesson.
git clone -b <BRANCH> <REMOTE_REPO_URL> <PATH_TO_PROJECT_DIR>
<REMOTE_REPO_URL>is the location of the remote repo (ex. https://github.com/GokuMohandas/follow).
<PATH_TO_PROJECT_DIR>is the name of the local directory you want to clone the project into (ex. mlops).
It's integral to be able to consistently create an environment to develop in so that we can reliably reproduce the same results. To do this, we'll need to explicitly detail all the requirements (python version, packages, etc.) as well as create the environment that will load all the requirements. By doing this, we'll not only be able to consistently reproduce results but also enable others to arrive at the same results.
We can set up our files below (setup.py & requirements.txt) with just the terminal and any text editor but you may use a code editor as well.
When we used our notebook, we had a preloaded set of packages (run
!pip list inside the notebook to see all of them). But now we want to define our environment so we can reproduce it for our Python scripts. There are many recommended options when it comes to packaging in Python and we'll be using the traditional and recommended Pip.
There are many alternative dependency management and packaging tools, such as Poetry, but there are still many things in flux with these newer options. We're going to stick with Pip because it works for our application and don't want to deal with issues like long resolve periods.
First thing we'll do is set up a virtual environment so we can isolate our packages (and versions) necessary for application from our other projects which may have different dependencies. Once we create our virtual environment, we'll activate it and install our required packages.
python3 -m venv venv source venv/bin/activate python -m pip install --upgrade pip setuptools wheel
Let's unpack what's happening here:
- Creating a Python virtual environment named
venv. Use Python 3.7.10 for our project.
- Activate our virtual environment. Type
deactivateto exit the virtual environment.
- Upgrading required packages so we download the latest package wheels.
We can use pyenv to manage different Python versions.
# Using pyenv to switch between Python versions $ python --version Python 3.6.9 $ pyenv versions system * 3.6.9 $ pyenv install 3.7.10 $ pyenv local 3.7.10 system 3.6.9 * 3.7.10 (set by /Users/goku/Documents/madewithml/mlops/.python-version) $ python --version Python 3.7.10
Let's dive into our
setup.py to see how what we're installing inside our virtual environment.
First, we're retrieving our required packages from our
requirements.txt file. While we could place these requirements directly inside
setup.py, many applications still look for a
requirements.txt file so we'll keep it separate.
touch requirements.txt setup.py
And we'll call these requirements in our setup.py script like so:
10 11 12
We've should add packages (with versions) to our
requirements.txtas we've installed them but if we haven't, you can't just do
pip freeze > requirements.txtbecause it dumps the dependencies of all our packages into the file (even the ones we didn't explicitly install). When a certain package updates, the stale dependency will still be there. To mitigate this, there are tools such as pipreqs, pip-tools, pipchill, etc. that will only list the packages that are not dependencies. However, if you're separating packages for different environments, then these solutions are limited as well.
The next several lines in our
setup.py file include some packages required for testing (
test_packages) and development (
dev_packages). These will be situationally required when we're testing or developing. For example, a general user of our application won't need to to test or develop so they'll only need the required packages, however, a technical developer will want both the test and dev packages to extend our code base.
We have test and dev packages separated because in our CI/CD lesson, we'll be using GitHub actions that will only be testing our code so we wanted to specify a way to load only the required packages for testing.
The heart of the
setup.py file is the
setup object which describes how to set up our package and it's dependencies. The first several lines cover metadata (name, description, etc.) and then we define the requirements. Here we're stating that we require a Python version equal to or above 3.6 and then passing in our required packages to
install_requires. Finally, we define extra requirements that different types of users may require.
53 54 55 56 57 58 59 60 61 62
The final lines of the file define various entry points we can use to interact with the application. Here we define some console scripts (commands) we can type on our terminal to execute certain actions. For example, after we install our package, we can type the command
tagifai to run the
app variable inside
59 60 61 62 63 64 65 66
We can install our package for different situations like so:
python -m pip install -e . # installs required packages only python -m pip install -e ".[dev]" # installs required + dev packages python -m pip install -e ".[test]" # installs required + test packages
--editableflag installs a project in develop mode so we can make changes without having to reinstall packages.
To cite this lesson, please use:
1 2 3 4 5 6