Skip to content

Pre-commit


Using the pre-commit git hooks to ensure checks before committing.
Goku Mohandas
Goku Mohandas
· · ·
Repository

📬  Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.

Intuition

Before performing a commit to our local repository, there are a lot of items on our mental todo list, ranging from styling, formatting, testing, etc. And it's very easy to forget some of these steps, especially when we want to "push to quick fix". To help us manage all these important steps, we can use pre-commit hooks, which will automatically be triggered when we try to perform a commit.

Though we can add these checks directly in our CI/CD pipeline (ex. via GitHub actions), it's significantly faster to validate our commits before pushing to our remote host and waiting to see what needs to be fixed before submitting yet another PR.

Installation

We'll be using the Pre-commit framework to help us automatically perform important checks via hooks when we make a commit.

# Install pre-commit
pip install pre-commit==2.19.0
pre-commit install

And we'll add this to our setup.py script instead of our requirements.txt file because it's not core to the machine learning operations.

1
2
3
4
5
6
7
8
9
# setup.py
setup(
    ...
    extras_require={
        "dev": docs_packages + style_packages + test_packages + ["pre-commit==2.19.0"],
        "docs": docs_packages,
        "test": test_packages,
    },
)

Config

We define our pre-commit hooks via a .pre-commit-config.yaml configuration file. We can either create our yaml configuration from scratch or use the pre-commit CLI to create a sample configuration which we can add to.

# Simple config
pre-commit sample-config > .pre-commit-config.yaml
cat .pre-commit-config.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.2.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
    -   id: check-yaml
    -   id: check-added-large-files

Hooks

When it comes to creating and using hooks, we have several options to choose from.

Built-in

Inside the sample configuration, we can see that pre-commit has added some default hooks from it's repository. It specifies the location of the repository, version as well as the specific hook ids to use. We can read about the function of these hooks and add even more by exploring pre-commit's built-in hooks. Many of them also have additional arguments that we can configure to customize the hook.

1
2
3
4
5
6
# Inside .pre-commit-config.yaml
...
-   id: check-added-large-files
    args: ['--maxkb=1000']
    exclude: "notebooks/tagifai.ipynb"
...

Be sure to explore the many other built-in hooks because there are some really useful ones that we use in our project. For example, check-merge-conflict to see if there are any lingering merge conflict strings or detect-aws-credentials if we accidentally left our credentials exposed in a file, and so much more.

And we can also exclude certain files from being processed by the hooks by using the optional exclude key. There are many other optional keys we can configure for each hook ID.

1
2
3
4
5
# Inside .pre-commit-config.yaml
...
-   id: check-yaml
    exclude: "mkdocs.yml"
...

Custom

Besides pre-commit's built-in hooks, there are also many custom, 3rd party popular hooks that we can choose from. For example, if we want to apply formatting checks with Black as a hook, we can leverage Black's pre-commit hook.

1
2
3
4
5
6
7
8
9
# Inside .pre-commit-config.yaml
...
-   repo: https://github.com/psf/black
    rev: 20.8b1
    hooks:
    -   id: black
        args: []
        files: .
...

This specific hook is defined under a .pre-commit-hooks.yaml inside Black's repository, as are other custom hooks under their respective package repositories.

Local

We can also create our own local hooks without configuring a separate .pre-commit-hooks.yaml. Here we're defining two pre-commit hooks, test-non-training and clean, to run some commands that we've defined in our Makefile. Similarly, we can run any entry command with arguments to create hooks very quickly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Inside .pre-commit-config.yaml
...
- repo: local
  hooks:
    - id: test
      name: test
      entry: make
      args: ["test"]
      language: system
      pass_filenames: false
    - id: clean
      name: clean
      entry: make
      args: ["clean"]
      language: system
      pass_filenames: false
View our complete .pre-commit-config.yaml
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
        exclude: "config/run_id.txt"
    -   id: check-yaml
        exclude: "mkdocs.yml"
    -   id: check-added-large-files
        args: ['--maxkb=1000']
        exclude: "notebooks"
    -   id: check-ast
    -   id: check-json
    -   id: check-merge-conflict
    -   id: detect-aws-credentials
    -   id: detect-private-key
-   repo: https://github.com/psf/black
    rev: 22.3.0
    hooks:
    -   id: black
        args: []
        files: .
-   repo: https://github.com/PyCQA/flake8
    rev: 3.9.2
    hooks:
    -   id: flake8
-   repo: https://github.com/PyCQA/isort
    rev: 5.10.1
    hooks:
    -   id: isort
        args: []
        files: .
-   repo: https://github.com/asottile/pyupgrade  # update python syntax
    rev: v2.34.0
    hooks:
    -   id: pyupgrade
        args: [--py36-plus]
- repo: local
hooks:
    - id: test
    name: test
    entry: make
    args: ["test"]
    language: system
    pass_filenames: false
    - id: clean
    name: clean
    entry: make
    args: ["clean"]
    language: system
    pass_filenames: false

Commit

Our pre-commit hooks will automatically execute when we try to make a commit. We'll be able to see if each hook passed or failed and make any changes. If any of the hooks failed, we have to fix the corresponding file or in many instances, reformatting will occur automatically.

...
detect private key.....................................PASSED
black..................................................FAILED
...

In the event that any of the hooks failed, we need to add and commit again to ensure that all hooks are passed.

git add .
git commit -m <MESSAGE>
precommit

Run

Though pre-commit hooks are meant to run before (pre) a commit, we can manually trigger all or individual hooks on all or a set of files.

# Run
pre-commit run --all-files  # run all hooks on all files
pre-commit run <HOOK_ID> --all-files # run one hook on all files
pre-commit run --files <PATH_TO_FILE>  # run all hooks on a file
pre-commit run <HOOK_ID> --files <PATH_TO_FILE> # run one hook on a file

Skip

It is highly not recommended to skip running any of the pre-commit hooks because they are there for a reason. But for some highly urgent, world saving commits, we can use the no-verify flag.

# Commit without hooks
git commit -m <MESSAGE> --no-verify

Highly recommend not doing this because no commit deserves to be force pushed no matter how "small" your change was. If you accidentally did this and want to clear the cache, run pre-commit run --all-files and execute the commit message operation again.

Update

In our .pre-commit-config.yaml configuration files, we've had to specify the versions for each of the repositories so we can use their latest hooks. Pre-commit has an autoupdate CLI command which will update these versions as they become available.

# Autoupdate
pre-commit autoupdate

We can also add this command to our Makefile to execute when a development environment is created so everything is up-to-date.

# Makefile
.ONESHELL:
venv:
    python3 -m venv venv
    source venv/bin/activate && \
    python3 -m pip install --upgrade pip setuptools wheel && \
    python3 -m pip install -e ".[dev]" && \
    pre-commit install && \
    pre-commit autoupdate

To cite this content, please use:

1
2
3
4
5
6
@article{madewithml,
    author       = {Goku Mohandas},
    title        = { Pre-commit - Made With ML },
    howpublished = {\url{https://madewithml.com/}},
    year         = {2022}
}