How Model Tuning Works

Traditionally, getting the best performing model on a dataset involves involves tedious manual hyperparameter tuning. On SINGA-Auto, model hyperparameter tuning is automated by conducting multiple trials in a train job.

Over the trials, the model is initialized with different hyperparameters (knobs), trained and evaluated. A hyperparameter tuning advisor on SINGA-Auto ingests the validation scores from these trials to suggest better hyperparameters for future trials, to maximise performance of a model on the dataset. At the very end of the train job, SINGA-Auto could deploy the best-scoring trials for predictions.

Defining Hyperparameter Search Space

You’ll define a search space of hyperparameters (knob configuration) in a declarative manner with the static method singa_auto.model.BaseModel.get_knob_config(). The method should return a mapping of hyperparameter names (knob names) to hyperparameter specifications (knob specifications). A hyperparameter specification is an instance of a class that extends singa_auto.model.BaseKnob, which is limited to any of the following:

  • singa_auto.model.FixedKnob
  • singa_auto.model.CategoricalKnob
  • singa_auto.model.FloatKnob
  • singa_auto.model.IntegerKnob
  • singa_auto.model.PolicyKnob
  • singa_auto.model.ArchKnob

Refer to their documentation for more details on each type of knob specification, and refer to Sample Models to see examples of how knob configurations are declared.

Model Policies

singa_auto.model.PolicyKnob is a special type of knob specification that allows SINGA-Auto to configure the behaviour of a model on a trial basis.

In a modern model hyperparameter tuning scheme, a model tends to switch between different “modes”, or so we call policies. For example, when you tune your model manually, you might want the model to do early-stopping for the first e.g. 100 trials, then conduct a final trial for a full e.g. 300 epochs. As such, the concept of model policies in SINGA-Auto enables SINGA-Auto’s tuning advisor to externally configure your model to switch between these “modes”.

Your model communicates to SINGA-Auto which policies it supports by adding PolicyKnob(policy_name) to your model’s knob_configuration. On the other hand, during training, SINGA-Auto configures the activation of the model’s policies on a trial basis by realising the values of PolicyKnob to either True (activated) or False (not activated).

For example, if SINGA-Auto’s tuning scheme for your model requires your model to engage in e.g. early-stopping for all trials except for the final trial, if your model has { 'early_stop': PolicyKnob('EARLY_STOP'), ... }, SINGA-Auto will pass early_stop=False for just the final trial as part of its knobs, and pass early_stop=True for all other trials. Your model would situationally do early-stopping based on the value of the knob early-stop.

Below is the list of officially recognized model policies:

Policy Description
SHARE_PARAMS Whether model should load the shared parameters passed in train()
EARLY_STOP Whether model should stop training early in train(), e.g. with use of early stopping or reduced no. of epochs
SKIP_TRAIN Whether model should skip training its parameters
QUICK_EVAL Whether model should stop evaluation early in evaluate(), e.g. by evaluating on only a subset of their validation dataset
DOWNSCALE Whether a smaller version of the model should be constructed e.g. with fewer layers

Model Tuning Schemes

At a model level, SINGA-Auto automatically selects the appropriate tuning scheme (advisor) based on the composition of the model’s knob configuration and the incoming train job’s budget.

Specifically, it employs the following rules, in the given order, to select the type of advisor to use:

Rule Tuning Scheme
Only PolicyKnob, FixedKnob
Only conduct a single trial
Only PolicyKnob, FixedKnob,
FloatKnob, IntegerKnob,
CategoricalKnob, with policy
SHARE_PARAMS

Hyperparameter tuning with Bayesian Optimization & cross-trial parameter sharing.
Share globally best-scoring parameters across workers in a epsilon greedy manner.
Optionally employ early stopping (EARLY_STOP policy) for all trials.

Only PolicyKnob, FixedKnob,
FloatKnob, IntegerKnob,
CategoricalKnob
Hyperparameter tuning with Bayesian Optimization. Optionally employ early stopping
(EARLY_STOP policy) before the last 1h, and perform standard trials during the last 1h.
Only PolicyKnob, FixedKnob,
ArchKnob, with policies
SHARE_PARAMS, EARLY_STOP
SKIP_TRAIN, QUICK_EVAL
DOWNSCALE, and TIME_HOURS budget
>= 12h
Architecture tuning with cell-based
It conducts ENAS architecture search before the last 12h, then performs the final
training of the best architectures found in the last 12h.

All others Hyperparameter tuning with uniformly random knobs

The following subsections briefly explain how to leverage on the various model tuning schemes on SINGA-Auto.

Hyperparameter Tuning with Bayesian Optimization

To tune the hyperparameters of your model, where the hyperparameters are simply floats, integers or categorical, use singa_auto.model.FixedKnob, singa_auto.model.CategoricalKnob, singa_auto.model.FloatKnob & singa_auto.model.IntegerKnob.

Hyperparameter Tuning with Bayesian Optimization & Early Stopping

To additionally employ early stopping during hyperparameter tuning to speed up the tuning process, declare an extra singa_auto.model.PolicyKnob of the EARLY_STOP policy (see Model Policies).

Refer to the sample model ./examples/models/image_classification/TfFeedForward.py.

Hyperparameter Tuning with Bayesian Optimization & Parameter Sharing

To additionally have best-scoring model parameters shared between trials to speed up the tuning process (as outlined in “SINGA-Auto: Machine Learning as an Analytics Service System”), declare an extra singa_auto.model.PolicyKnob of the SHARE_PARAMS policy (see Model Policies).

Refer to the sample model ./examples/models/image_classification/PyDenseNetBc.py and its corresponding usage script ./examples/scripts/image_classification/train_densenet.py to better understand how to do parameter sharing.

Architecture Tuning with ENAS

To tune the architecture for your model with the modern architecture search algorithm “Efficient Neural Architecture Search via Parameter Sharing” (ENAS), declare a singa_auto.model.ArchKnob and offer the policies SHARE_PARAMS, EARLY_STOP, SKIP_TRAIN, QUICK_EVAL and DOWNSCALE (see Model Policies). Specifically, you’ll need your model to support parameter sharing, stopping training early, skipping the training step, evaluating on a subset of the validation dataset, and downscaling the model e.g. to use fewer layers. These policies are critical in the speed & performance of ENAS. See Deep Dive on ENAS to understand more about SINGA-Auto’s implementation of ENAS.

Refer to the sample model ./examples/models/image_classification/TfEnas.py and its corresponding usage script ./examples/scripts/image_classification/run_enas.py to better understand how to do architecture tuning.

Deep Dive on ENAS

The ENAS paper outlines a new methodology for automatic neural network construction, speeding up the original Neural Architecture Search (NAS) methodology by 1000x without affecting its ability to search for a competitive architecture. The authors made the crucial observation that 2 different architectures would share a common subgraph, and the model parameters in that subgraph could be reused across trials without having to re-train these parameters from scratch every trial.

The following is an overview of how ENAS works. As explained in the ENAS paper, during an ENAS search for best CNN architecture (ENAS Search), there is an alternation between 2 phases: training of the ENAS CNN’s shared parameters (CNN Train Phase), and the training of the ENAS controller (Controller Train Phase). While CNN parameters are carried over the phases, the CNN’s shared parameters are not trained during Controller Train Phases. After ENAS Search is done, there is a final training of the best CNN architecture found (ENAS Train), this time initializing its CNN parameters from scratch,

On SINGA-Auto, we’ve replicated the Cell-Based ENAS controller for image classification as one of SINGA-Auto’s tuning scheme and a SINGA-Auto model TfEnas, with very close reference to author’s code. In this specific setup for ENAS, ENAS Search is done with the construction of a single supergraph of all possible architectures, while ENAS Train is done with the construction of a fixed graph of the best architecture (with slight architectural differences from ENAS Search). Each CNN Train Phase involves training the CNN for 1 epoch, while within each Controller Train Phase, the controller is trained for 30 steps. In each controller step, 10 architectures are sampled from the controller, evaluated on the ENAS CNN by dynamically changing its architecture, and losses based on validation accuracies are back-propagated in the controller to update the controller’s parameters. Each validation accuracy is computed on only a batch of the validation dataset. The alternation between CNN Train Phase and Controller Train Phase happens for X cycles during ENAS Search, and close to the end of training, during ENAS Train, architecture samples with highest validation accuracies, this time computed on the full validation dataset, would be trained from scratch to arrive at final best models.

We’ve generalized the ENAS controller, its architecture encoding scheme and its overall tuning scheme on SINGA-Auto, such that SINGA-Auto models can leverage on architecture tuning with a flexible architecture encoding, and SINGA-Auto’s application developers can train with these models in an end-to-end manner.

We’ve also devised a simple, yet effective strategy to run ENAS in a distributed setting. When given multiple GPUs, SINGA-Auto performs ENAS locally at each worker in a train job, with these workers sharing a central ENAS controller.