Documenting code for your users and your future self.
Code tells you how, comments tell you why. -- Jeff Atwood
Another way to organize our code is to document it. We want to do this so we can make it easier for others (and our future selves) to easily navigate the code base and build on it. We know our code base best the moment we finish writing it but fortunately documenting it will allow us to quickly get back to that stage time and time again. Documentation involves many different things to developers so let's define the most common (and required) components:
comments: Terse descriptions of why a piece of code exists.
typing: Specification of a function's inputs and outputs data types, providing insight into what a function consumes and produces at a quick glance.
docstrings: Meaningful descriptions for functions and classes that describe overall utility as wel as arguments, returns, etc.
documentation: A rendered webpage that summarizes all the functions, classes, API calls, workflows, examples, etc. so we can view and traverse through the code base without actually having to look at the code just yet.
It's important to be as explicit as possible with our code. We're already discussed choosing explicit names for variables, functions, etc. but another way we can be explicit is by defining the types for our function's inputs and outputs. We want to do this so we can quickly know what data types a function expects and how we can utilize it's outputs for downstream processes.
So far, our functions have looked like this:
def pad_sequences(sequences, max_seq_len): ... return padded_sequences
But we can incorporate so much more information using typing:
def pad_sequences(sequences: Sequence, max_seq_len: int = 0) -> np.ndarray: ... return padded_sequences
Here we're defining that our input argument
sequences is a NumPy array,
max_seq_len is an integer with a default value of 0 and our output is also a NumPy array. There are many data types that we can work with, including but not limited to
Sequence and more and of course included types such as
float, etc. You can also use any of your own defined classes as types (ex.
Starting from Python 3.9+, common types are built in so we don't need to import them with
from typing import List, Set, Dict, Tuple, Sequence anymore.
We can make our code even more explicit by adding docstrings to functions and classes to describe overall utility, arguments, returns, exceptions and more. Let's take a look at an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Let's unpack the different parts of this function's docstring:
[Lines 2-3]: Summary of the overall utility of the function.
[Lines 5-16]: Example of how to use our function.
[Lines 18-19]: Insertion of a
Noteor other types of admonitions.
[Lines 21-23]: Description of the function's input arguments.
[Lines 25-26]: Any exceptions that may be raised in the function.
[Lines 28-29]: Description of the function's output(s).
If you're using Visual Studio Code (highly recommend), you should get the free Python Docstrings Generator extension so you can type
""" under a function and then hit the Shift key to generate a template docstring. It will autofill parts of the docstring using the typing information and even exception in your code!
So we're going through all this effort to including typing and docstrings to our functions but it's all tucked away inside our scripts. But what if we can collect all this effort and automatically surface it as documentation? Well that's exactly what we'll do with the following open-source packages → final result here.
- mkdocs (generates project documentation)
- mkdocs-macros-plugin (required plugins)
- mkdocs-material (styling to beautiful render documentation)
- mkdocstrings (fetch documentation automatically from docstrings)
Here are the steps we'll follow to automatically generate our documentation and serve it. You can find all the files we're talking about in our repository.
mkdocs.ymlin root directory.
- Fill in metadata, config, extensions and plugins (more setup options like custom styling, overrides, etc. here). I add some custom CSS inside
docs/static/cscto make things look a little bit nicer :)
# Project information site_name: TagifAI site_url: https://madewithml.com/#applied-ml site_description: Tag suggestions for projects on Made With ML. site_author: Goku Mohandas # Repository repo_url: https://github.com/GokuMohandas/applied-ml repo_name: GokuMohandas/applied-ml edit_uri: "" #disables edit button ...
- Add logo image and favicon to
# Configuration theme: name: material logo: static/images/logo.png favicon: static/images/favicon.ico
- Fill in navigation in
# Page tree nav: - Home: - TagIfAI: index.md - Getting started: - Workflow: workflows.md - Reference: - CLI: tagifai/main.md - Configuration: tagifai/config.md - Data: tagifai/data.md - Models: tagifai/models.md - Training: tagifai/train.md - Inference: tagifai/predict.md - Utilities: tagifai/utils.md - API: api.md
- Fill in
mkdocstringsplugin information inside
make install-devto make sure you have the required packages for documentation.
::: tagifai.dataMarkdown file to populate it with the information from function and class docstrings from
tagifai/data.py. Repeat for other scripts as well. We can add our own text directly to the Markdown file as well, like we do in
python -m mkdocs serveto serve your docs to
# Serve documentation $ python -m mkdocs serve INFO - Building documentation... INFO - Cleaning site directory INFO - Serving on http://127.0.0.1:8000
View our rendered documentation via GitHub pages → here.
We can easily serve our documentation for free using GitHub pages and even host it on a custom domain. All we had to do was add the file
.github/workflows/documentation.yml which GitHub Actions will use to build and deploy our documentation every time we push to the
main branch (we'll learn about GitHub Actions in our CI/CD lesson soon).