Packaging a Python Codebase
It's integral to be able to consistently create an environment to develop in so that we can reliably reproduce the same results. To do this, we'll need to explicitly detail all the requirements (python version, packages, etc.) as well as create the environment that will load all the requirements. By doing this, we'll not only be able to consistently reproduce results but also enable others to arrive at the same results.
When we used our notebook, we had a preloaded set of packages (run
!pip list inside the notebook to see all of them). But now we want to define our environment so we can reproduce it for our Python scripts. There are many recommended options when it comes to packaging in Python and we'll be using the traditional and recommended Pip.
I'm a huge fan (and user) of Poetry which is a dependency management and packaging tool but there are still many things in flux. I'm sticking with Pip because it works for our application and don't want to deal with issues like long resolve periods.
First thing we'll do is set up a virtual environment so we can isolate our packages (and versions) necessary for application from our other projects which may have different dependencies. Once we create our virtual environment, we'll activate it and install our required packages.
1 2 3 4
Let's unpack what's happening here:
- Creating a vitual environment named
- Activating our virtual environment. Type
deactivateto exit out of the virtual environment.
- Upgrading required packages so we download the latest package wheels.
- Install from
--editableinstalls a project in develop mode)
Let's dive into our
setup.py to see how what we're installing inside our virtual environment.
First, we're retrieving our required packages from our
requirements.txt file. While we could place these requirements directly inside
setup.py, many applications still look for a
requirements.txt file so we'll keep it separate.
10 11 12
We've been adding packages to our
requirements.txt as we've needed them but if you haven't, you shouldn't just do
pip freeze > requirements.txt because it dumps the dependencies of all your packages into the file. When a certain package updates, the stale dependency will still be there. To mitigate this, there are tools such as pipreqs, pip-tools, pipchill, etc. that will only list the packages that are not dependencies. However, if you're separating packages for different environments, then these solutions are limited as well.
The next several lines in our
setup.py file include some packages required for testing (
test_packages) and development (
dev_packages). These will be situationally required when we're testing or developing. For example, a general user of our application won't need to to test or develop so they'll only need the required packages, however, a fellow developer will want both the test and dev packages to extend our code base.
We have test and dev packages separated because later on, we'll be using GitHub actions that will only be testing our code so we wanted to specify a way to load only the required packages for testing.
The heart of the
setup.py file is the
setup object which describes how to set up our package and it's dependencies. The first several lines cover metadata (name, description, etc.) and then we define the requirements. Here we're stating that we require a Python version equal to or above 3.6 and then passing in our required packages to
install_requires. Finally, we define extra requirements that different types of users may require.
53 54 55 56 57 58 59 60 61 62
The final lines of the file define various entry points we can use to interact with the application. Here we define some console scripts (commands) we can type on our terminal to execute certain actions. For example, after we install our package, we can type the command
tagifai to run the
app variable inside
59 60 61 62 63 64 65 66
We can install our package for different situations like so:
1 2 3
To cite this lesson, please use:
1 2 3 4 5 6