Skip to content

Documentation

Repository ยท Documentation

Documenting code for your users and your future self.

Intuition

Code tells you how, comments tell you why. -- Jeff Atwood

Another way to organize our code is to document it. We want to do this so we can make it easier for others (and our future selves) to easily navigate the code base and build on it. We know our code base best the moment we finish writing it but fortunately documenting it will allow us to quickly get back to that stage time and time again. Documentation involves many different things to developers so let's define the most common (and required) components:

  • comments: Terse descriptions of why a piece of code exists.
  • typing: Specification of a function's inputs and outputs data types, providing insight into what a function consumes and produces at a quick glance.
  • docstrings: Meaningful descriptions for functions and classes that describe overall utility as wel as arguments, returns, etc.
  • documentation: A rendered webpage that summarizes all the functions, classes, API calls, workflows, examples, etc. so we can view and traverse through the code base without actually having to look at the code just yet.

Application

Let's look at what documentation looks like for our application and be sure to check out the auto-generated documentation page for it as well.

Typing

It's important to be as explicit as possible with our code. We're already discussed choosing explicit names for variables, functions, etc. but another way we can be explicit is by defining the types for our function's inputs and outputs. We want to do this so we can quickly know what data types a function expects and how we can utilize it's outputs for downstream processes.

So far, our functions have looked like this:

def pad_sequences(sequences, max_seq_len):
    ...
    return padded_sequences

But we can incorporate so much more information using typing:

def pad_sequences(sequences: Sequence, max_seq_len: int = 0) -> np.ndarray:
    ...
    return padded_sequences

Here we're defining that our input argument sequences is a NumPy array, max_seq_len is an integer with a default value of 0 and our output is also a NumPy array. There are many data types that we can work with, including but not limited to List, Set, Dict, Tuple, Sequence and more and of course included types such as int, float, etc. You can also use any of your own defined classes as types (ex. nn.Module, LabelEncoder).

Note

Starting from Python 3.9+, common types are built in so we don't need to import them with from typing import List, Set, Dict, Tuple, Sequence anymore.

Docstring

We can make our code even more explicit by adding docstrings to functions and classes to describe overall utility, arguments, returns, exceptions and more. Let's take a look at an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def pad_sequences(sequences: np.ndarray, max_seq_len: int = 0) -> np.ndarray:
    """Zero pad sequences to a specified `max_seq_len`
    or to the length of the largest sequence in `sequences`.

    Usage:

    ```python
    # Pad inputs
    seq = np.array([[1, 2, 3], [1, 2]], dtype=object)
    padded_seq = pad_sequences(sequences=seq, max_seq_len=5)
    print (padded_seq)
    ```
    <pre>
    [[1. 2. 3. 0. 0.]
     [1. 2. 0. 0. 0.]]
    </pre>

    Note:
        Input `sequences` must be 2D.

    Args:
        sequences (np.ndarray): 2D array of data to be padded.
        max_seq_len (int, optional): Length to pad sequences to. Defaults to 0.

    Raises:
        ValueError: Input sequences are not two-dimensional.

    Returns:
        An array with the zero padded sequences.

    """
    # Check shape
    if not sequences.ndim == 2:
        raise ValueError("Input sequences are not two-dimensional.")

    # Get max sequence length
    max_seq_len = max(
        max_seq_len, max(len(sequence) for sequence in sequences)
    )

    # Pad
    padded_sequences = np.zeros((len(sequences), max_seq_len))
    for i, sequence in enumerate(sequences):
        padded_sequences[i][: len(sequence)] = sequence
    return padded_sequences

Let's unpack the different parts of this function's docstring:

  • [Lines 2-3]: Summary of the overall utility of the function.
  • [Lines 5-16]: Example of how to use our function.
  • [Lines 18-19]: Insertion of a Note or other types of admonitions.
  • [Lines 21-23]: Description of the function's input arguments.
  • [Lines 25-26]: Any exceptions that may be raised in the function.
  • [Lines 28-29]: Description of the function's output(s).

Tip

If you're using Visual Studio Code (highly recommend), you should get the free Python Docstrings Generator extension so you can type """ under a function and then hit the Shift key to generate a template docstring. It will autofill parts of the docstring using the typing information and even exception in your code!

vscode docstring generation

Mkdocs

So we're going through all this effort to including typing and docstrings to our functions but it's all tucked away inside our scripts. But what if we can collect all this effort and automatically surface it as documentation? Well that's exactly what we'll do with the following open-source packages โ†’ final result here.

Here are the steps we'll follow to automatically generate our documentation and serve it. You can find all the files we're talking about in our repository.

  1. Create mkdocs.yml in root directory.
    touch mkdocs.yaml
    
  2. Fill in metadata, config, extensions and plugins (more setup options like custom styling, overrides, etc. here). I add some custom CSS inside docs/static/csc to make things look a little bit nicer :)
    # Project information
    site_name: TagifAI
    site_url: https://madewithml.com/#applied-ml
    site_description: Tag suggestions for projects on Made With ML.
    site_author: Goku Mohandas
    
    # Repository
    repo_url: https://github.com/GokuMohandas/applied-ml
    repo_name: GokuMohandas/applied-ml
    edit_uri: "" #disables edit button
    ...
    
  3. Add logo image and favicon to static/images.
    # Configuration
    theme:
      name: material
      logo: static/images/logo.png
      favicon: static/images/favicon.ico
    
  4. Fill in navigation in mkdocs.yml.
    # Page tree
    nav:
      - Home:
          - TagIfAI: index.md
      - Getting started:
        - Workflow: workflows.md
      - Reference:
        - CLI: tagifai/main.md
        - Configuration: tagifai/config.md
        - Data: tagifai/data.md
        - Models: tagifai/models.md
        - Training: tagifai/train.md
        - Inference: tagifai/predict.md
        - Utilities: tagifai/utils.md
      - API: api.md
    
  5. Fill in mkdocstrings plugin information inside mkdocs.yml.
  6. Rerun make install-dev to make sure you have the required packages for documentation.
  7. Add ::: tagifai.data Markdown file to populate it with the information from function and class docstrings from tagifai/data.py. Repeat for other scripts as well. We can add our own text directly to the Markdown file as well, like we do in tagifai/config.md.
  8. Run python -m mkdocs serve to serve your docs to http://localhost:8000/.
# Serve documentation
$ python -m mkdocs serve
INFO    -  Building documentation...
INFO    -  Cleaning site directory
INFO    -  Serving on http://127.0.0.1:8000

View our rendered documentation via GitHub pages โ†’ here.

Note

We can easily serve our documentation for free using GitHub pages and even host it on a custom domain. All we had to do was add the file .github/workflows/documentation.yml which GitHub Actions will use to build and deploy our documentation every time we push to the main branch (we'll learn about GitHub Actions in our CI/CD lesson soon).