Optimization
Optimizing a subset of hyperparameters to achieve an objective.
Intuition
What is it?
Optimization is the process of fine-tuning the hyperparameters in our experiment to optimize towards a particular objective. It can be a computationally involved process depending on the number of parameters, search space and model architectures. Hyperparameters don't just include the model's parameters but they also include parameters (choices) from preprocessing, splitting, etc. When we look at all the different parameters that can be tuned, it quickly becomes a very large search space. However, just because something is a hyperparameter doesn't mean we need to tune it.
- It's absolutely alright to fix some hyperparameters (ex.
lower=True
during preprocessing) and remove them from the current tuning subset. Just be sure to note which parameters you are fixing and your reasoning for doing so. - You can initially just tune a small, yet influential, subset of hyperparameters that you believe will yield best results.
Why do we need it?
We want to optimize our hyperparameters so that we can understand how each of them affects our objective. By running many trials across a reasonable search space, we can determine near ideal values for our different parameters. It's also a great opportunity to determine if a smaller parameters yield similar performances as larger ones (efficiency).
How can we do it?
There are many options for hyperparameter tuning (Optuna, Ray tune, Hyperopt, etc.). We'll be using Optuna for it's simplicity, popularity and efficiency though they are all equally so. It really comes down to familiarity and whether a library has a specific implementation readily tested and available.
Application
There are many factors to consider when performing hyperparameter optimization and luckily Optuna allows us to implement them with ease. We'll be conducting a small study where we'll tune a set of arguments (we'll do a much more thorough study of the parameter space when we move our code to Python scripts). Here's the process for the study:
- Define an objective (metric) and identifying the direction to optimize.
[OPTIONAL]
Choose a sampler for determining parameters for subsequent trials. (default is a tree based sampler).[OPTIONAL]
Choose a pruner to end unpromising trials early.- Define the parameters to tune in each trial and the distribution of values to sample.
Note
There are many more options (multiple objectives, storage options, etc.) to explore but this basic set up will allow us to optimize quite well.
from argparse import Namespace
from numpyencoder import NumpyEncoder
# Specify arguments
args = Namespace(
char_level=True,
filter_sizes=list(range(1, 11)),
batch_size=64,
embedding_dim=128,
num_filters=128,
hidden_dim=128,
dropout_p=0.5,
lr=2e-4,
num_epochs=200,
patience=10,
)
We're going to modify our Trainer
object to be able to prune unpromising trials based on the trial's validation loss.
class Trainer(object):
...
def train(self, ...):
...
# Pruning based on the intermediate value
self.trial.report(val_loss, epoch)
if self.trial.should_prune():
raise optuna.TrialPruned()
...
Code for complete Trainer
class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
We'll also modify our train_cnn
function to include information about the trial.
def train_cnn(args, df, trial=None):
...
# Trainer module
trainer = Trainer(
model=model, device=device, loss_fn=loss_fn,
optimizer=optimizer, scheduler=scheduler, trial=trial)
...
Code for complete train_cnn
function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
Objective
We need to define an objective
function that will consume a trial and a set of arguments and produce the metric to optimize on (f1
in our case).
def objective(trial, args):
"""Objective function for optimization trials."""
# Paramters (to tune)
args.embedding_dim = trial.suggest_int("embedding_dim", 128, 512)
args.num_filters = trial.suggest_int("num_filters", 128, 512)
args.hidden_dim = trial.suggest_int("hidden_dim", 128, 512)
args.dropout_p = trial.suggest_uniform("dropout_p", 0.3, 0.8)
args.lr = trial.suggest_loguniform("lr", 5e-5, 5e-4)
# Train & evaluate
artifacts = train_cnn(args=args, df=df, trial=trial)
# Set additional attributes
trial.set_user_attr("precision", artifacts["performance"]["overall"]["precision"])
trial.set_user_attr("recall", artifacts["performance"]["overall"]["recall"])
trial.set_user_attr("f1", artifacts["performance"]["overall"]["f1"])
trial.set_user_attr("threshold", artifacts["threshold"])
return artifacts["performance"]["overall"]["f1"]
Study
We're ready to kick off our study with our MLFlowCallback so we can track all of the different trials.
from optuna.integration.mlflow import MLflowCallback
NUM_TRIALS = 50 # small sample for now
# Optimize
pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=5)
study = optuna.create_study(study_name="optimization", direction="maximize", pruner=pruner)
mlflow_callback = MLflowCallback(
tracking_uri=mlflow.get_tracking_uri(), metric_name='f1')
study.optimize(lambda trial: objective(trial, args),
n_trials=NUM_TRIALS,
callbacks=[mlflow_callback])
A new study created in memory with name: optimization Epoch: 1 | train_loss: 0.00645, val_loss: 0.00314, lr: 3.48E-04, _patience: 10 ... Epoch: 23 | train_loss: 0.00029, val_loss: 0.00175, lr: 3.48E-05, _patience: 1 Stopping early! Trial 0 finished with value: 0.5999225606985846 and parameters: {'embedding_dim': 508, 'num_filters': 359, 'hidden_dim': 262, 'dropout_p': 0.6008497926241321, 'lr': 0.0003484755175747328}. Best is trial 0 with value: 0.5999225606985846. INFO: 'optimization' does not exist. Creating a new experiment ... Trial 10 pruned. ... Epoch: 25 | train_loss: 0.00029, val_loss: 0.00156, lr: 2.73E-05, _patience: 2 Epoch: 26 | train_loss: 0.00028, val_loss: 0.00152, lr: 2.73E-05, _patience: 1 Stopping early! Trial 49 finished with value: 0.6220047640997922 and parameters: {'embedding_dim': 485, 'num_filters': 420, 'hidden_dim': 477, 'dropout_p': 0.7984462152799114, 'lr': 0.0002619841505205434}. Best is trial 46 with value: 0.63900047716579.
# MLFlow dashboard
get_ipython().system_raw("mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri $PWD/experiments/ &")
ngrok.kill()
ngrok.set_auth_token("")
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)
MLflow Tracking UI: https://d19689b7ba4e.ngrok.io
You can compare all (or a subset) of the trials in our experiment.

We can then view the results through various lens (contours, parallel coordinates, etc.)


# All trials
trials_df = study.trials_dataframe()
trials_df = trials_df.sort_values(["value"], ascending=False) # sort by metric
trials_df.head()
# Best trial
print (f"Best value (val loss): {study.best_trial.value}")
print (f"Best hyperparameters: {study.best_trial.params}")
Best value (f1): 0.63900047716579 Best hyperparameters: {'embedding_dim': 335, 'num_filters': 477, 'hidden_dim': 458, 'dropout_p': 0.6707843486583486, 'lr': 0.00029782100137454434}
Note
Don't forget to save learned parameters (ex. decision threshold) during training which you'll need later for inference.
# Save best parameters
params = {**args.__dict__, **study.best_trial.params}
params["threshold"] = study.best_trial.user_attrs["threshold"]
print (json.dumps(params, indent=2, cls=NumpyEncoder))
{ "char_level": true, "filter_sizes": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ], "batch_size": 64, "embedding_dim": 335, "num_filters": 477, "hidden_dim": 458, "dropout_p": 0.6707843486583486, "lr": 0.00029782100137454434, "num_epochs": 200, "patience": 10, "threshold": 0.22135180234909058 }
... and now we're finally ready to move from working in Jupyter notebooks to Python scripts. We'll be revisiting everything we did so far, but this time with proper software engineering principles such as object oriented programming (OOPs), testing, styling, etc.