Skip to content

CI/CD for Machine Learning


Using workflows to establish continuous integration and delivery pipelines to reliably iterate on our application.
Goku Mohandas
Goku Mohandas
· · ·
Repository

📬  Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.

Intuition

Continuous integration (CI) allows our team to develop, test and integrate code in a structured fashion. This allows the team to more confidently and frequently develop since their work will be properly integrated. Continuous delivery (CD) is responsible for delivering our integrated code to a variety of applications that are dependent on it. With CI/CD pipelines, we can develop and deploy knowing that our systems can quickly adapt and work as intended.

GitHub Actions

In this lesson we're going to use GitHub Actions to create CI/CD pertaining to the code we push to git. We'll learn more about CI/CD in our in our orchestration lesson where we'll more generally apply it to DataOps and MLOps.

ci/cd workflows

GitHub Actions has the added advantage of integrating really well with GitHub and since all of our work is versioned there, we can easily create workflows based on GitHub events (push, PR, release, etc.). GitHub Actions also has a rich marketplace full of workflows that we can use for our own project. And, best of all, GitHub Actions is free for public repositories.

Components

We'll learn about GitHub Actions by understanding the components that compose an Action. These components abide by a specific workflow syntax which can be extended with the appropriate context and expression syntax.

ci/cd with github actions

Workflows

With GitHub Actions, we are creating automatic workflows to do something for us. We'll start by creating a .github/workflows directory to organize all of our workflows.

mkdir -p .github/workflows
touch .github/workflows/testing.yml
touch .github/workflows/documentation.yml

Each workflow file will contain the specific instructions for that action. For example, this testing workflow is responsible for conducting tests on our code base. We can specify the name of our workflow at the top of our YAML file.

1
2
# .github/workflows/testing.yml
name: testing

Events

Workflows are triggered by an event, which can be something that occurs on a schedule (cron), webhook or manually. In our application, we'll be using the push and pull request webhook events to run the testing workflow when someone directly pushes or submits a PR to the main branch.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# .github/workflows/testing.yml
on:
  push:
    branches:
    - main
    - master
  pull_request:
    branches:
    - main
    - master

Be sure to check out the complete list of the different events that can trigger a workflow.

Jobs

Once the event is triggered, a set of jobs run on a runner, which is the application that runs the job using a specific operating system. Our first (and only) job is test-code which runs on the latest version of ubuntu.

1
2
3
4
# .github/workflows/testing.yml
jobs:
  test-code:
    runs-on: ubuntu-latest

Jobs run in parallel but if you need to create dependent jobs, where if a particular job fails all it's dependent jobs will be skipped, then be sure to use the needs key. One a similar note, we can also share data between jobs.

Steps

Each job contains a series of steps which are executed in order. Each step has a name, as well as actions to use from the GitHub Action marketplace or commands we want to run. For the test-code job, the steps are to checkout the repo, install the necessary dependencies and run tests.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# .github/workflows/testing.yml
jobs:
  test-code:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repo
        uses: actions/[email protected]
      - name: Set up Python
        uses: actions/[email protected]
        with:
          python-version: 3.9.1
      - name: Caching
        uses: actions/[email protected]
        with:
          path: $/{/{ env.pythonLocation /}/}
          key: $/{/{ env.pythonLocation /}/}-$/{/{ hashFiles('setup.py') /}/}-$/{/{ hashFiles('requirements.txt') /}/}
      - name: Install dependencies
        run: |
          python3 -m pip install -e ".[test]" --no-cache-dir
      - name: Execute tests
        run: pytest tests/tagifai --ignore tests/code/test_main.py --ignore tests/code/test_data.py

We are only executing a subset of the tests here because we won't access to data or model artifacts when these tests are executed on GitHub's runners. However, if our blob storage and model registry are on the cloud, we can access them and perform all the tests. This will often involve using credentials to access these resources, which we can set as Action secrets (GitHub repository page > Settings > Secrets).

View .github/workflows/testing.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
name: testing
on:
push:
    branches:
    - master
    - main
pull_request:
    branches:
    - master
    - main
jobs:
test-code:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repo
        uses: actions/[email protected]
    - name: Set up Python
        uses: actions/[email protected]
        with:
        python-version: 3.9.1
    - name: Caching
        uses: actions/[email protected]
        with:
        path: $/{/{ env.pythonLocation /}/}
        key: $/{/{ env.pythonLocation /}/}-$/{/{ hashFiles('setup.py') /}/}-$/{/{ hashFiles('requirements.txt') /}/}
    - name: Install dependencies
        run: |
        python -m pip install -e ".[test]" --no-cache-dir
    - name: Execute tests
        run: pytest tests/tagifai --ignore tests/tagifai/test_main.py --ignore tests/tagifai/test_data.py

Notice that one of our steps is to cache the entire Python environment with a specific key. This will significantly speed up the time required to run our Action the next time as long as the key remains unchanged (same python location, setup.py and requirements.txt).

caching with github actions

Our other workflow is responsible for automatically generating and deploying our mkdocs documentation. The "Deploy documentation" step below will create/update a new branch in our repository called gh-pages which will have the generation UI files for our documentation. We can deploy this branch as a GitHub pages website by going to Settings > Pages and setting the source branch to gh-pages and folder to /root > Save. This will generate the public URL for our documentation and it will automatically update every time our workflow runs after each PR.

1
2
3
4
5
6
7
8
# .github/workflows/documentation.yml
name: documentation
...
jobs:
  build-docs:
      ...
      - name: Deploy documentation
        run: mkdocs gh-deploy --force
View .github/workflows/documentation.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
name: documentation
on:
push:
    branches:
    - master
    - main
pull_request:
    branches:
    - master
    - main
jobs:
build-docs:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repo
        uses: actions/[email protected]
    - name: Set up Python
        uses: actions/[email protected]
        with:
        python-version: 3.9.1
    - name: Caching
        uses: actions/[email protected]
        with:
        path: $/{/{ env.pythonLocation /}/}
        key: $/{/{ env.pythonLocation /}/}-$/{/{ hashFiles('setup.py') /}/}-$/{/{ hashFiles('requirements.txt') /}/}
    - name: Install dependencies
        run: |
        python -m pip install -e ".[docs]" --no-cache-dir
    - name: Deploy documentation
        run: mkdocs gh-deploy --force

We can also generate private documentation for private repositories and even host it on a custom domain.

Runs

Recall that workflows will be triggered when certain events occur. For example, our testing workflow will initiate on a push or PR to the main branch. We can see the workflow's runs (current and previous) on the Actions tab on our repository page. And if we click on a specific run, we can view the all the steps and their outputs as well. We can also set branch protection rules (GitHub repository page > Settings > Branches) to ensure that these workflow runs are all successful before we can merge to the main branch.

successful ci/cd run

While there are methods, such as act, to run and test workflows locally, many of them are not stable enough for reliable use.

Serving

There are a wide variety of GitHub actions available for deploying and serving our ML applications after all the integration tests have passed. Most of them will require that we have a Dockerfile defined that will load and launch our service with the appropriate artifacts. Read more about the required infrastructure in our systems design lesson.

If we want to deploy and serve multiple models at a time, it's highly recommended to use a purpose-built model server to seamlessly inspect, update, serve, rollback, etc. multiple versions of models.

The specific deployment method we use it entirely up dependent on the application, team, existing infrastructure, etc. The key component is that we are able to update our application when all the integration tests pass without having to manually intervene for deployment.

Note

We'll learn how to separate developing and serving in our orchestration lesson. We can still leverage CI/CD workflows for pushing our code to Git but separate workflows can then use the validated codebase to execute downstream workflows (evaluation, serving, retraining, etc.).

Marketplace

So what exactly are these actions that we're using from the marketplace? For example, our first step in the test-code job above is to checkout the repo using the actions/[email protected] GitHub Action. The Action's link contains information about how to use it, scenarios, etc. The Marketplace has actions for a variety of needs, ranging from continuous deployment for various cloud providers, code quality checks, etc.

  • Great Expectations: ensure that our GE checkpoints pass when any changes are made that could affect the data engineering pipelines. This action also creates a free GE dashboard with Netlify that has the updated data docs.
  • Continuous ML: train, evaluate and monitor your ML models and generate a report summarizing the workflows. If you don't want to train offline, you can manually/auto trigger the training pipeline to run on cloud infrastructure (AWS/GCP) or self hosted runners.

Don't restrict your workflows to only what's available on the Marketplace or single command operations. We can do things like include code coverage reports, deploy an updated Streamlit dashboard and attach it's URL to the PR, deliver (CD) our application to an AWS Lambda / EC2, etc.


To cite this content, please use:

1
2
3
4
5
6
@article{madewithml,
    author       = {Goku Mohandas},
    title        = { CI/CD - Made With ML },
    howpublished = {\url{https://madewithml.com/}},
    year         = {2022}
}