Experiment Tracking
Repository ยท Notebook
๐ฌ Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.
Intuition
So far, we've been training and evaluating our different baselines but haven't really been tracking these experiments. We'll fix this but defining a proper process for experiment tracking which we'll use for all future experiments (including hyperparameter optimization). Experiment tracking is the process of managing all the different experiments and their components, such as parameters, metrics, models and other artifacts and it enables us to:
- Organize all the necessary components of a specific experiment. It's important to have everything in one place and know where it is so you can use them later.
- Reproduce past results (easily) using saved experiments.
- Log iterative improvements across time, data, ideas, teams, etc.
Tools
There are many options for experiment tracking but we're going to use MLFlow (100% free and open-source) because it has all the functionality we'll need (and growing integration support). We can run MLFlow on our own servers and databases so there are no storage cost / limitations, making it one of the most popular options and is used by Microsoft, Facebook, Databricks and others. You can also set up your own Tracking servers to synchronize runs amongst multiple team members collaborating on the same task.
There are also several popular options such as a Comet ML (used by Google AI, HuggingFace, etc.), Neptune (used by Roche, NewYorker, etc.), Weights and Biases (used by Open AI, Toyota Research, etc.). These are fantastic tools that provide features like dashboards, seamless integration, hyperparameter search, reports and even debugging!
Many platforms are leveraging their position as the source for experiment data to provide features that extend into other parts of the ML development pipeline such as versioning, debugging, monitoring, etc.
Application
We'll start by initializing all the required arguments for our experiment.
pip install mlflow==1.23.1 -q
1 2 3 |
|
args
contains all the parameters needed and it's nice to have it all organized under one variable so we can easily log it and tweak it for different experiments (we'll see this when we do hyperparameter optimization).
1 2 3 4 5 6 7 8 9 10 11 |
|
Next, we'll set up our model registry where all the experiments and their respective runs will be stored. We'll load trained models from this registry as well using specific run IDs.
1 2 3 4 |
|
Tip
On Windows, the last line where we set the tracking URI should have three forwards slashes:
1 |
|
1 |
|
experiments labeled_projects.csv sample_data
When we're collaborating with other team members, this model registry will live on the cloud. Members from our team can connect to it (with authentication) to save and load trained models. If you don't want to set up and maintain a model registry, this is where platforms like Comet ML, Weights and Biases and others offload a lot of technical setup.
Training
And to make things simple, we'll encapsulate all the components for training into one function which returns all the artifacts we want to be able to track from our experiment.
Ignore the
trial
argument for now (default isNone
) as it will be used during the hyperparameter optimization lesson for pruning unpromising trials.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
Tracking
With MLFlow we need to first initialize an experiment and then you can do runs under that experiment.
1 2 |
|
1 2 |
|
INFO: 'baselines' does not exist. Creating a new experiment
1 2 3 4 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Epoch: 00 | train_loss: 1.16930, val_loss: 1.21451 Epoch: 10 | train_loss: 0.46116, val_loss: 0.65903 Epoch: 20 | train_loss: 0.31565, val_loss: 0.56018 Epoch: 30 | train_loss: 0.25207, val_loss: 0.51967 Epoch: 40 | train_loss: 0.21740, val_loss: 0.49822 Epoch: 50 | train_loss: 0.19615, val_loss: 0.48529 Epoch: 60 | train_loss: 0.18249, val_loss: 0.47708 Epoch: 70 | train_loss: 0.17330, val_loss: 0.47158 Epoch: 80 | train_loss: 0.16671, val_loss: 0.46765 Epoch: 90 | train_loss: 0.16197, val_loss: 0.46488 { "precision": 0.8929962902778195, "recall": 0.8333333333333334, "f1": 0.8485049088497365 }
Viewing
Let's view what we've tracked from our experiment. MLFlow serves a dashboard for us to view and explore our experiments on a localhost port. If you're running this on your local computer, you can simply run the MLFlow server:
mlflow server -h 0.0.0.0 -p 8000 --backend-store-uri $PWD/experiments/
and open http://localhost:8000/ to view the dashboard. But if you're on Google colab, we're going to use localtunnel to create a connection between this notebook and a public URL.
If localtunnel is not installed, you may need to run
!npm install -g localtunnel
in a cell first.
1 2 3 |
|
MLFlow creates a main dashboard with all your experiments and their respective runs. We can sort runs by clicking on the column headers.

We can click on any of our experiments on the main dashboard to further explore it (click on the timestamp link for each run). Then click on metrics on the left side to view them in a plot:

Loading
We need to be able to load our saved experiment artifacts for inference, retraining, etc.
1 2 3 4 5 |
|
1 2 3 4 |
|
run_id ... tags.mlflow.runName 0 3e5327289e9c499cabfda4fe8b09c037 ... sgd [1 rows x 22 columns]
1 2 3 4 5 6 7 8 9 10 |
|
1 |
|
{ "precision": 0.8929962902778195, "recall": 0.8333333333333334, "f1": 0.8485049088497365 }
1 2 3 |
|
['natural-language-processing']
Tip
We can also load a specific run's model artifacts, by using it's run ID, directly from the model registry without having to save them to a temporary directory.
1 2 3 4 5 6 |
|
To cite this content, please use:
1 2 3 4 5 6 |
|