Hyperparameter search

Hyperparameter search discovers an optimal set of hyperparameters that produces the best model performance. Trainer supports several hyperparameter search backends - Optuna, SigOpt, Weights & Biases, Ray Tune - through hyperparameter_search() to optimize an objective or even multiple objectives.

This guide will go over how to set up a hyperparameter search for each of the backends.

pip install optuna/sigopt/wandb/ray[tune]

To use hyperparameter_search(), you need to create a model_init function. This function includes basic model information (arguments and configuration) because it needs to be reinitialized for each search trial in the run.

The model_init function is incompatible with the optimizers parameter. Subclass Trainer and override the create_optimizer_and_scheduler() method to create a custom optimizer and scheduler.

An example model_init function is shown below.

def model_init(trial):
    return AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        token=True if model_args.use_auth_token else None,
    )

Pass model_init to Trainer along with everything else you need for training. Then you can call hyperparameter_search() to start the search.

hyperparameter_search() accepts a direction parameter to specify whether to minimize, maximize, or minimize and maximize multiple objectives. You’ll also need to set the backend you’re using, an object containing the hyperparameters to optimize for, the number of trials to run, and a compute_objective to return the objective values.

If compute_objective isn’t defined, the default compute_objective is called which is the sum of an evaluation metric like F1.

from transformers import Trainer

trainer = Trainer(
    model=None,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    processing_class=tokenizer,
    model_init=model_init,
    data_collator=data_collator,
)
trainer.hyperparameter_search(...)

The following examples demonstrate how to perform a hyperparameter search for the learning rate and training batch size using the different backends.

Optuna

Ray Tune

SigOpt

Weights & Biases

Distributed Data Parallel

Trainer only supports hyperparameter search for distributed data parallel (DDP) on the Optuna and SigOpt backends. Only the rank-zero process is used to generate the search trial, and the resulting parameters are passed along to the other ranks.

< > Update on GitHub

Transformers

Hyperparameter search

Distributed Data Parallel