# Optimizing Hyperparameters

Optimizing a subset of hyperparameters to achieve an objective.
Goku Mohandas
· · ·
Repository · Notebook

📬  Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.

## Intuition

Optimization is the process of fine-tuning the hyperparameters in our experiment to optimize towards a particular objective. It can be a computationally involved process depending on the number of parameters, search space and model architectures. Hyperparameters don't just include the model's parameters but they also include parameters (choices) from preprocessing, splitting, etc. When we look at all the different parameters that can be tuned, it quickly becomes a very large search space. However, just because something is a hyperparameter doesn't mean we need to tune it.

• It's absolutely alright to fix some hyperparameters (ex. lower=True during preprocessing) and remove them from the tuning subset. Just be sure to note which parameters you are fixing and your reasoning for doing so.
• You can initially just tune a small, yet influential, subset of hyperparameters that you believe will yield best results.

We want to optimize our hyperparameters so that we can understand how each of them affects our objective. By running many trials across a reasonable search space, we can determine near ideal values for our different parameters. It's also a great opportunity to determine if a smaller parameters yield similar performances as larger ones (efficiency).

## Tools

There are many options for hyperparameter tuning (Optuna, Ray tune, Hyperopt, etc.). We'll be using Optuna for it's simplicity, popularity and efficiency though they are all equally so. It really comes down to familiarity and whether a library has a specific implementation readily tested and available.

## Application

There are many factors to consider when performing hyperparameter optimization and luckily Optuna allows us to implement them with ease. We'll be conducting a small study where we'll tune a set of arguments (we'll do a much more thorough study of the parameter space when we move our code to Python scripts). Here's the process for the study:

1. Define an objective (metric) and identifying the direction to optimize.
2. [OPTIONAL] Choose a sampler for determining parameters for subsequent trials. (default is a tree based sampler).
3. [OPTIONAL] Choose a pruner to end unpromising trials early.
4. Define the parameters to tune in each trial and the distribution of values to sample.
pip install optuna==2.10.0 numpyencoder==0.3.0 -q

 1 import optuna 

We're going to use the same training function as before since we've added the functionality to prune a specific run if the trial argument is not None.

 1 2 3 4 # Pruning (inside train() function) trial.report(val_loss, epoch) if trial.should_prune(): raise optuna.TrialPruned() 

## Objective

We need to define an objective function that will consume a trial and a set of arguments and produce the metric to optimize on (f1 in our case).

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 def objective(args, trial): """Objective function for optimization trials.""" # Parameters to tune args.analyzer = trial.suggest_categorical("analyzer", ["word", "char", "char_wb"]) args.ngram_max_range = trial.suggest_int("ngram_max_range", 3, 10) args.learning_rate = trial.suggest_loguniform("learning_rate", 1e-2, 1e0) args.power_t = trial.suggest_uniform("power_t", 0.1, 0.5) # Train & evaluate artifacts = train(args=args, df=df, trial=trial) # Set additional attributes performance = artifacts["performance"] print(json.dumps(performance, indent=2)) trial.set_user_attr("precision", performance["precision"]) trial.set_user_attr("recall", performance["recall"]) trial.set_user_attr("f1", performance["f1"]) return performance["f1"] 

## Study

We're ready to kick off our study with our MLFlowCallback so we can track all of the different trials.

 1 2 from numpyencoder import NumpyEncoder from optuna.integration.mlflow import MLflowCallback 
 1 NUM_TRIALS = 20 # small sample for now 
 1 2 3 4 5 6 7 8 # Optimize pruner = optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=5) study = optuna.create_study(study_name="optimization", direction="maximize", pruner=pruner) mlflow_callback = MLflowCallback( tracking_uri=mlflow.get_tracking_uri(), metric_name="f1") study.optimize(lambda trial: objective(args, trial), n_trials=NUM_TRIALS, callbacks=[mlflow_callback]) 

A new study created in memory with name: optimization
Epoch: 00 | train_loss: 1.34116, val_loss: 1.35091
...
Epoch: 90 | train_loss: 0.32167, val_loss: 0.57661
Stopping early!
Trial 0 finished with value: 0.7703281822265505 and parameters: {'analyzer': 'char', 'ngram_max_range': 10, 'learning_rate': 0.025679294001785473, 'power_t': 0.15046698128066294}. Best is trial 0 with value: 0.7703281822265505.

...

Trial 10 pruned.

...

Epoch: 80 | train_loss: 0.16680, val_loss: 0.43964
Epoch: 90 | train_loss: 0.16134, val_loss: 0.43686
Trial 19 finished with value: 0.8470890576153735 and parameters: {'analyzer': 'char_wb', 'ngram_max_range': 4, 'learning_rate': 0.08452049154544644, 'power_t': 0.39657115651885855}. Best is trial 3 with value: 0.8470890576153735.


 1 2 3 # Run MLFlow server and localtunnel get_ipython().system_raw("mlflow server -h 0.0.0.0 -p 8000 --backend-store-uri \$PWD/experiments/ &") !npx localtunnel --port 8000 
1. Click on the "optimization" experiment on the left side under Experiments. 2. Select runs to compare by clicking on the toggle box to the left of each run or by clicking on the toggle box in the header to select all runs in this experiment. 3. Click on the Compare button.

1. In the comparison page, we can then view the results through various lens (contours, parallel coordinates, etc.)
 1 2 3 4 # All trials trials_df = study.trials_dataframe() trials_df = trials_df.sort_values(["user_attrs_f1"], ascending=False) # sort by metric trials_df.head() 
number value datetime_start datetime_complete duration params_analyzer params_learning_rate params_ngram_max_range params_power_t user_attrs_f1 user_attrs_precision user_attrs_recall state
3 3 0.847089 2022-05-18 18:16:58.108105 2022-05-18 18:17:03.569948 0 days 00:00:05.461843 char_wb 0.088337 4 0.118196 0.847089 0.887554 0.833333 COMPLETE
19 19 0.847089 2022-05-18 18:17:58.219462 2022-05-18 18:18:00.642571 0 days 00:00:02.423109 char_wb 0.084520 4 0.396571 0.847089 0.887554 0.833333 COMPLETE
12 12 0.840491 2022-05-18 18:17:41.845179 2022-05-18 18:17:45.792068 0 days 00:00:03.946889 char_wb 0.139578 7 0.107273 0.840491 0.877431 0.826389 COMPLETE
13 13 0.840491 2022-05-18 18:17:45.862705 2022-05-18 18:17:49.657014 0 days 00:00:03.794309 char_wb 0.154396 7 0.433669 0.840491 0.877431 0.826389 COMPLETE
15 15 0.836255 2022-05-18 18:17:50.464948 2022-05-18 18:17:54.446481 0 days 00:00:03.981533 char_wb 0.083253 7 0.106982 0.836255 0.881150 0.819444 COMPLETE
 1 2 3 # Best trial print (f"Best value (f1): {study.best_trial.value}") print (f"Best hyperparameters: {json.dumps(study.best_trial.params, indent=2)}") 
Best value (f1): 0.8535985582060417
Best hyperparameters: {
"analyzer": "char_wb",
"ngram_max_range": 4,
"learning_rate": 0.08981103667371809,
"power_t": 0.2583427488720579
}

 1 2 3 # Save best parameter values args = {**args.__dict__, **study.best_trial.params} print (json.dumps(args, indent=2, cls=NumpyEncoder)) 
{
"lower": true,
"stem": false,
"analyzer": "char_wb",
"ngram_max_range": 4,
"alpha": 0.0001,
"learning_rate": 0.08833689034118489,
"power_t": 0.1181958972675695
}


... and now we're finally ready to move from working in Jupyter notebooks to Python scripts. We'll be revisiting everything we did so far, but this time with proper software engineering principles such as object oriented programming (OOPs), styling, testing, etc. → https://madewithml.com/#mlops

To cite this content, please use:

 1 2 3 4 5 6 @article{madewithml, author = {Goku Mohandas}, title = { Optimization - Made With ML }, howpublished = {\url{https://madewithml.com/}}, year = {2022} }