Optimizing Hyperparameters
Repository ยท Notebook
๐ฌ Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.
Intuition
Optimization is the process of fine-tuning the hyperparameters in our experiment to optimize towards a particular objective. It can be a computationally involved process depending on the number of parameters, search space and model architectures. Hyperparameters don't just include the model's parameters but they also include parameters (choices) from preprocessing, splitting, etc. When we look at all the different parameters that can be tuned, it quickly becomes a very large search space. However, just because something is a hyperparameter doesn't mean we need to tune it.
- It's absolutely alright to fix some hyperparameters (ex.
lower=True
during preprocessing) and remove them from the tuning subset. Just be sure to note which parameters you are fixing and your reasoning for doing so. - You can initially just tune a small, yet influential, subset of hyperparameters that you believe will yield best results.
We want to optimize our hyperparameters so that we can understand how each of them affects our objective. By running many trials across a reasonable search space, we can determine near ideal values for our different parameters. It's also a great opportunity to determine if a smaller parameters yield similar performances as larger ones (efficiency).
Tools
There are many options for hyperparameter tuning (Optuna, Ray tune, Hyperopt, etc.). We'll be using Optuna for it's simplicity, popularity and efficiency though they are all equally so. It really comes down to familiarity and whether a library has a specific implementation readily tested and available.
Application
There are many factors to consider when performing hyperparameter optimization and luckily Optuna allows us to implement them with ease. We'll be conducting a small study where we'll tune a set of arguments (we'll do a much more thorough study of the parameter space when we move our code to Python scripts). Here's the process for the study:
- Define an objective (metric) and identifying the direction to optimize.
[OPTIONAL]
Choose a sampler for determining parameters for subsequent trials. (default is a tree based sampler).[OPTIONAL]
Choose a pruner to end unpromising trials early.- Define the parameters to tune in each trial and the distribution of values to sample.
pip install optuna==2.10.0 numpyencoder==0.3.0 -q
1 |
|
We're going to use the same training function as before since we've added the functionality to prune a specific run if the trial
argument is not None
.
1 2 3 4 |
|
Objective
We need to define an objective function that will consume a trial and a set of arguments and produce the metric to optimize on (f1
in our case).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Study
We're ready to kick off our study with our MLFlowCallback so we can track all of the different trials.
1 2 |
|
1 |
|
1 2 3 4 5 6 7 8 |
|
A new study created in memory with name: optimization Epoch: 00 | train_loss: 1.34116, val_loss: 1.35091 ... Epoch: 90 | train_loss: 0.32167, val_loss: 0.57661 Stopping early! Trial 0 finished with value: 0.7703281822265505 and parameters: {'analyzer': 'char', 'ngram_max_range': 10, 'learning_rate': 0.025679294001785473, 'power_t': 0.15046698128066294}. Best is trial 0 with value: 0.7703281822265505. ... Trial 10 pruned. ... Epoch: 80 | train_loss: 0.16680, val_loss: 0.43964 Epoch: 90 | train_loss: 0.16134, val_loss: 0.43686 Trial 19 finished with value: 0.8470890576153735 and parameters: {'analyzer': 'char_wb', 'ngram_max_range': 4, 'learning_rate': 0.08452049154544644, 'power_t': 0.39657115651885855}. Best is trial 3 with value: 0.8470890576153735.
1 2 3 |
|

- In the comparison page, we can then view the results through various lens (contours, parallel coordinates, etc.)


1 2 3 4 |
|
1 2 3 |
|
Best value (f1): 0.8535985582060417 Best hyperparameters: { "analyzer": "char_wb", "ngram_max_range": 4, "learning_rate": 0.08981103667371809, "power_t": 0.2583427488720579 }
1 2 3 |
|
{ "lower": true, "stem": false, "analyzer": "char_wb", "ngram_max_range": 4, "alpha": 0.0001, "learning_rate": 0.08833689034118489, "power_t": 0.1181958972675695 }
... and now we're finally ready to move from working in Jupyter notebooks to Python scripts. We'll be revisiting everything we did so far, but this time with proper software engineering principles such as object oriented programming (OOPs), styling, testing, etc. โ https://madewithml.com/#mlops
To cite this content, please use:
1 2 3 4 5 6 |
|