CI/CD for Machine Learning
📬 Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.
Intuition
Continuous integration (CI) allows our team to develop, test and integrate code in a structured fashion. This allows the team to more confidently and frequently develop since their work will be properly integrated. Continuous delivery (CD) is responsible for delivering our integrated code to a variety of applications that are dependent on it. With CI/CD pipelines, we can develop and deploy knowing that our systems can quickly adapt and work as intended.
GitHub Actions
In this lesson we're going to use GitHub Actions to create CI/CD pertaining to the code we push to git. We'll learn more about CI/CD in our in our orchestration lesson where we'll more generally apply it to DataOps and MLOps.

GitHub Actions has the added advantage of integrating really well with GitHub and since all of our work is versioned there, we can easily create workflows based on GitHub events (push, PR, release, etc.). GitHub Actions also has a rich marketplace full of workflows that we can use for our own project. And, best of all, GitHub Actions is free for public repositories.
Components
We'll learn about GitHub Actions by understanding the components that compose an Action. These components abide by a specific workflow syntax which can be extended with the appropriate context and expression syntax.

Workflows
With GitHub Actions, we are creating automatic workflows to do something for us. We'll start by creating a .github/workflows directory to organize all of our workflows.
mkdir -p .github/workflows
touch .github/workflows/testing.yml
touch .github/workflows/documentation.yml
Each workflow file will contain the specific instructions for that action. For example, this testing workflow is responsible for conducting tests on our code base. We can specify the name of our workflow at the top of our YAML file.
1 2 |
|
Events
Workflows are triggered by an event, which can be something that occurs on a schedule (cron), webhook or manually. In our application, we'll be using the push and pull request webhook events to run the testing workflow when someone directly pushes or submits a PR to the main branch.
1 2 3 4 5 6 7 8 9 10 |
|
Be sure to check out the complete list of the different events that can trigger a workflow.
Jobs
Once the event is triggered, a set of jobs run on a runner, which is the application that runs the job using a specific operating system. Our first (and only) job is test-code
which runs on the latest version of ubuntu.
1 2 3 4 |
|
Jobs run in parallel but if you need to create dependent jobs, where if a particular job fails all it's dependent jobs will be skipped, then be sure to use the needs key. One a similar note, we can also share data between jobs.
Steps
Each job contains a series of steps which are executed in order. Each step has a name, as well as actions to use from the GitHub Action marketplace or commands we want to run. For the test-code
job, the steps are to checkout the repo, install the necessary dependencies and run tests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
We are only executing a subset of the tests here because we won't access to data or model artifacts when these tests are executed on GitHub's runners. However, if our blob storage and model registry are on the cloud, we can access them and perform all the tests. This will often involve using credentials to access these resources, which we can set as Action secrets (GitHub repository page >
Settings
>Secrets
).
View .github/workflows/testing.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Notice that one of our steps is to cache the entire Python environment with a specific key. This will significantly speed up the time required to run our Action the next time as long as the key remains unchanged (same python location, setup.py and requirements.txt).

Our other workflow is responsible for automatically generating and deploying our mkdocs documentation. The "Deploy documentation" step below will create/update a new branch in our repository called gh-pages which will have the generation UI files for our documentation. We can deploy this branch as a GitHub pages website by going to Settings
> Pages
and setting the source branch to gh-pages
and folder to /root
> Save
. This will generate the public URL for our documentation and it will automatically update every time our workflow runs after each PR.
1 2 3 4 5 6 7 8 |
|
View .github/workflows/documentation.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
We can also generate private documentation for private repositories and even host it on a custom domain.
Runs
Recall that workflows will be triggered when certain events occur. For example, our testing workflow will initiate on a push or PR to the main branch. We can see the workflow's runs (current and previous) on the Actions tab on our repository page. And if we click on a specific run, we can view the all the steps and their outputs as well. We can also set branch protection rules (GitHub repository page > Settings
> Branches
) to ensure that these workflow runs are all successful before we can merge to the main branch.

While there are methods, such as act, to run and test workflows locally, many of them are not stable enough for reliable use.
Serving
There are a wide variety of GitHub actions available for deploying and serving our ML applications after all the integration tests have passed. Most of them will require that we have a Dockerfile defined that will load and launch our service with the appropriate artifacts. Read more about the required infrastructure in our systems design lesson.
- AWS EC2, Google Compute Engine, Azure VM, etc.
- container orchestration services such as AWS ECS or Google Kubernetes Engine
- serverless options such as AWS Lambda or Google Cloud Functions.
If we want to deploy and serve multiple models at a time, it's highly recommended to use a purpose-built model server to seamlessly inspect, update, serve, rollback, etc. multiple versions of models.
The specific deployment method we use it entirely up dependent on the application, team, existing infrastructure, etc. The key component is that we are able to update our application when all the integration tests pass without having to manually intervene for deployment.
Note
We'll learn how to separate developing and serving in our orchestration lesson. We can still leverage CI/CD workflows for pushing our code to Git but separate workflows can then use the validated codebase to execute downstream workflows (evaluation, serving, retraining, etc.).
Marketplace
So what exactly are these actions that we're using from the marketplace? For example, our first step in the test-code
job above is to checkout the repo using the actions/[email protected] GitHub Action. The Action's link contains information about how to use it, scenarios, etc. The Marketplace has actions for a variety of needs, ranging from continuous deployment for various cloud providers, code quality checks, etc.
- Great Expectations: ensure that our GE checkpoints pass when any changes are made that could affect the data engineering pipelines. This action also creates a free GE dashboard with Netlify that has the updated data docs.
- Continuous ML: train, evaluate and monitor your ML models and generate a report summarizing the workflows. If you don't want to train offline, you can manually/auto trigger the training pipeline to run on cloud infrastructure (AWS/GCP) or self hosted runners.
Don't restrict your workflows to only what's available on the Marketplace or single command operations. We can do things like include code coverage reports, deploy an updated Streamlit dashboard and attach it's URL to the PR, deliver (CD) our application to an AWS Lambda / EC2, etc.
To cite this content, please use:
1 2 3 4 5 6 |
|