The last step in achieving reproducibility is to deploy our versioned code and artifacts in a reproducible environment. This goes well beyond the virtual environment we configured for our Python applications because there are system-level specifications (operating system, required packages, etc.) we aren't capturing. We want to be able to encapsulate all the requirements we need so that there are no external dependencies that would prevent someone else from reproducing our exact application.
There are actually quite a few solutions for system-level reproducibility (VMs, container engines, etc.) but the Docker container engine is by far the most popular for several key advantages:
- reproducibility via Dockerfile with explicit instructions to deploy our application in a specific system.
- isolation via containers as to not affect other applications that may also run on the same underlying operating system.
- and many more advantages including size (no separate OS needed for each application), speed, Docker Hub, etc.
We're going to use Docker to deploy our application locally in an isolated, reproducible and scalable fashion. Once we do this, any machine with the Docker engine installed can reproduce our work. However, there is so much more to Docker, which you can explore in the docs, that goes beyond what we'll need.
Before we install Docker, let's take a look at how the container engine works on top our operating system, which can be our local hardware or something managed on the cloud.
The Docker container engine is responsible for spinning up configured containers, which contains our application and it's dependencies (binaries, libraries, etc.). The container engine is very efficient in that it doesn't need to create a separate operating system for each containerized application. This also means that our containers can share the system's resources via the Docker engine.
Now we're ready to install Docker based on our operating system. Once installed, we can start the Docker Desktop which will allow us to create and deploy our containerized applications.
The first step is to build a docker image which has the application and all it's specified dependencies. We can create this image using a Dockerfile which outlines a set of instructions. These instructions essentially build read-only image layers on top of each other to construct our entire image. Let's take a look at our application's Dockerfile and the image layers it creates.
The first line specifies the base image we want to pull FROM. Here we want to use the base image for running Python based applications and specifically for Python 3.7 with the slim variant. Since we're only deploying a Python application, this slim variant with minimal packages satisfies our requirements while keeping the size of the image layer low.
# Base image FROM python:3.7-slim
Next we're going to install our application dependencies. First, we'll COPY the required files from our local file system so we can use them for installation. Alternatively, if we were running on some remote infrastructure, we could've pulled from a remote git host. Once we have our files, we can install the packages required to install our application's dependencies using the RUN command. Once we're done using the packages, we can remove them to keep our image layer's size to a minimum.
# Install dependencies COPY setup.py setup.py COPY requirements.txt requirements.txt COPY Makefile Makefile RUN apt-get update \ && apt-get install -y --no-install-recommends gcc build-essential \ && rm -rf /var/lib/apt/lists/* \ && make install \ && apt-get purge -y --auto-remove gcc build-essential
Next we're ready to COPY over the required files to actually RUN our application.
# Copy COPY tagifai tagifai COPY app app COPY data data COPY model model COPY config config COPY stores stores # Pull assets from S3 RUN dvc init --no-scm RUN dvc remote add -d storage stores/blob RUN dvc pull
Since our application (API) requires PORT 500 to be open, we need to specify in our Dockerfile to expose it.
# Export ports EXPOSE 5000
The final step in building our image is to specify the executable to be run when a container is built from our image. For our application, we want to launch our API with gunicorn. Note that we aren't using the
make command here since we previously uninstalled it after installing our dependencies.
# Start app ENTRYPOINT ["gunicorn", "-c", "config/gunicorn.py", "-k", "uvicorn.workers.UvicornWorker", "app.api:app"]
There are many more commands available for us to use in the Dockerfile, such as using environment variables (ENV) and arguments (ARG), command arguments (CMD), specifying volumes (VOLUME), setting the working directory (WORKDIR) and many more, all of which you can explore through the official docs.
Once we're done composing the Dockerfile, we're ready to build our image using the build command which allows us to add a tag and specify the location of the Dockerfile to use.
# Build image docker build -t tagifai:latest -f Dockerfile .
We can inspect all built images and their attributes like so:
# Images docker images
REPOSITORY TAG IMAGE ID CREATED SIZE tagifai latest 02c88c95dd4c 23 minutes ago 2.57GB
We can also remove any or all images based on their unique IDs.
# Remove images docker rmi <IMAGE_ID> # remove an image docker rmi $(docker images -a -q) # remove all images
Once we've built our image, we're ready to run a container using that image with the run command which allows us to specify the image, port forwarding, etc.
# Run container docker run -p 5000:5000 --name tagifai tagifai:latest
We can inspect all containers (running or stopped) like so:
# Containers docker ps # running containers docker ps -a # stopped containers
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ee5f1b08abd5 tagifai:latest "gunicorn -c config…" 19 minutes ago Created 0.0.0.0:5000->5000/tcp tagifai
We can also stop and remove any or all containers based on their unique IDs.
# Stop and remove containers docker stop <CONTAINER_ID> # stop a running container docker rm <CONTAINER_ID> # remove a container docker stop $(docker ps -a -q) # stop all containers docker rm $(docker ps -a -q) # remove all containers
If our application required multiple containers for different services (API, database, etc.) then we can bring them all up at once using the docker compose functionality and scale and manage them using a container orchestration system like Kubernetes (K8s). If we're specifically deploying ML workflows, we can use a toolkit like KubeFlow to help us manage and scale.
In the event that we run into errors while building our image layers, a very easy way to debug the issue is to run the container with the image layers that have been build so far. We can do this by only including the commands that have executed so far (and all COPY statements) and then run the container.
# Run container docker run -p 5000:5000 -it tagifai /bin/bash
Once we have our container running, we can use our application as we would on our local machine but now it's reproducible on any operating system that can run the Docker container engine. We've covered just what we need from Docker to deploy our application but there is so much more to Docker, which you can explore in the official docs.
To cite this lesson, please use:
1 2 3 4 5 6