Documenting Code
Repository ยท Documentation
๐ฌ Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.
Intuition
Code tells you how, comments tell you why. -- Jeff Atwood
We can further organize our code by documenting it to make it easier for others (and our future selves) to easily navigate and extend it. We know our code base best the moment we finish writing it but fortunately documenting it will allow us to quickly get back to that familiar state of mind. Documentation can mean many different things to developers, so let's define the most common components:
comments
: short descriptions as to why a piece of code exists.typing
: specification of a function's inputs and outputs' data types, providing information pertaining to what a function consumes and produces.docstrings
: meaningful descriptions for functions and classes that describe overall utility, arguments, returns, etc.docs
: rendered webpage that summarizes all the functions, classes, workflows, examples, etc.
For now, we'll produce our documentation locally but be sure to check out the auto-generated documentation page for our application. We'll learn how to automatically create and keep our docs up-to-date in our CI/CD lesson every time we make changes to our code base.
Code collaboration
How do you currently share your code with others on your team? What can be improved?
Typing
It's important to be as explicit as possible with our code. We've already discussed choosing explicit names for variables, functions, etc. but another way we can be explicit is by defining the types for our function's inputs and outputs.
So far, our functions have looked like this:
1 2 |
|
But we can incorporate so much more information using typing:
1 2 3 |
|
Here we've defined:
- input parameter
a
is a list - input parameter
b
is an integer with default value 0 - output parameter
c
is a NumPy array
There are many other data types that we can work with, including List
, Set
, Dict
, Tuple
, Sequence
and more, as well as included types such as int
, float
, etc. You can also use types from packages we install (ex. np.ndarray
) and even from our own defined classes (ex. LabelEncoder
).
Starting from Python 3.9+, common types are built in so we don't need to import them with
from typing import List, Set, Dict, Tuple, Sequence
anymore.
Docstrings
We can make our code even more explicit by adding docstrings to describe overall utility, arguments, returns, exceptions and more. Let's take a look at an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Let's unpack the different parts of this function's docstring:
[Line 3]
: Summary of the overall utility of the function.[Lines 5-12]
: Example of how to use our function.[Lines 14-16]
: Description of the function's input arguments.[Lines 18-19]
: Any exceptions that may be raised in the function.[Lines 21-22]
: Description of the function's output(s).
We'll render these docstrings in the docs section below to produce this:

Take this time to update all the functions and classes in our project with docstrings and be sure to refer to the repository as a guide. Note that you my have to explicitly import some libraries to certain scripts because the type
requires it. For example, we don't explicitly use the Pandas library in our data.py
script, however, we do use pandas dataframes as input arguments.
1 2 3 4 5 6 |
|
Ideally we would add docstrings to our functions and classes as we develop them, as opposed to doing it all at once at the end.
Tip
If using Visual Studio Code, be sure to use the Python Docstrings Generator extension so you can type """
under a function and then hit the Shift key to generate a template docstring. It will autofill parts of the docstring using the typing information and even exception in your code!
Docs
So we're going through all this effort of including typing and docstrings to our functions but it's all tucked away inside our scripts. What if we can collect all this effort and automatically surface it as documentation? Well that's exactly what we'll do with the following open-source packages โ final result here.
-
Install required packages:
Instead of directly adding these requirements to ourpip install mkdocs==1.3.0 mkdocstrings==0.18.1
requirements.txt
file, we're going to isolate it from our core required libraries. We want to do this because not everyone will need to create documentation as it's not a core machine learning operation (training, inference, etc.). We'll tweak oursetup.py
script to make this possible. We'll define these packages under adocs_packages
object:and then we'll add this to# setup.py docs_packages = [ "mkdocs==1.3.0", "mkdocstrings==0.18.1" ]
setup()
object in the script:
Now we can install this package with:1 2 3 4 5 6 7 8 9
# Define our package setup( ... install_requires=[required_packages], extras_require={ "dev": docs_packages, "docs": docs_packages, }, )
We're also defining apython3 -m pip install -e ".[docs]"
dev
option which we'll update over the course so that developers can install all required and extra packages in one call, instead of calling each extra required packages one at a time.We created an explicitpython3 -m pip install -e ".[dev]"
doc
option because a user will want to only download the documentation packages to generate documentation (none of the other packages will be required). We'll see this in action when we use CI/CD workflows to autogenerate documentation via GitHub Actions. -
Initialize mkdocs
This will create the following files:python3 -m mkdocs new .
. โโ docs/ โ โโ index.md โโ mkdocs.yml
-
We'll start by overwriting the default
index.md
file in ourdocs
directory with information specific to our project:index.md1 2 3 4 5 6 7 8 9 10 11
## Documentation - [Workflows](tagifai/main.md): main workflows. - [tagifai](tagifai/data.md): documentation of functionality. ## MLOps Lessons Learn how to combine machine learning with software engineering to develop, deploy & maintain production ML applications. - Lessons: [https://madewithml.com/](https://madewithml.com/#mlops) - Code: [GokuMohandas/mlops-course](https://github.com/GokuMohandas/mlops-course)
-
Next we'll create documentation files for each script in our
tagifai
directory:mkdir docs/tagifai cd docs/tagifai touch main.md utils.md data.md train.md evaluate.md predict.md cd ../../
It's helpful to have the
docs
directory structure mimic our project's structure as much as possible. This becomes even more important as we document more directories in future lessons. -
Next we'll add
tagifai.<SCRIPT_NAME>
to each file underdocs/tagifai
. This will populate the file with information about the functions and classes (using their docstrings) fromtagifai/<SCRIPT_NAME>.py
thanks to themkdocstrings
plugin.Be sure to check out the complete list of mkdocs plugins.
# docs/tagifai/data.md ::: tagifai.data
-
Finally, we'll add some configurations to our
mkdocs.yml
file that mkdocs automatically created:# mkdocs.yml site_name: Made With ML site_url: https://madewithml.com/ repo_url: https://github.com/GokuMohandas/mlops-course/ nav: - Home: index.md - workflows: - main: tagifai/main.md - tagifai: - data: tagifai/data.md - evaluate: tagifai/evaluate.md - predict: tagifai/predict.md - train: tagifai/train.md - utils: tagifai/utils.md theme: readthedocs plugins: - mkdocstrings watch: - . # reload docs for any file changes
-
Serve our documentation locally:
python3 -m mkdocs serve
Publishing
We can easily serve our documentation for free using GitHub pages for public repositories as wells as private documentation for private repositories. And we can even host it on a custom domain (ex. company's subdomain).
Be sure to check out the auto-generated documentation page for our application. We'll learn how to automatically create and keep our docs up-to-date in our CI/CD lesson every time we make changes to our code base.
To cite this content, please use:
1 2 3 4 5 6 |
|