/progressive-neural-architecture-search

Implementation of Progressive Neural Architecture Search in Keras and Tensorflow

Primary LanguagePythonMIT LicenseMIT

Progressive Neural Architecture Search with ControllerManager RNN

Basic implementation of ControllerManager RNN from Progressive Neural Architecture Search.

  • Uses tf.keras to define and train children / generated networks, which are found via sequential model-based optimization in Tensorflow, ranked by the Controller RNN.
  • Define a state space by using StateSpace, a manager which maintains input states and handles communication between the ControllerManager RNN and the user.
  • ControllerManager manages the training and evaluation of the Controller RNN
  • NetworkManager handles the training and reward computation of the children models

Usage

Training a Controller RNN

At a high level : For full training details, please see train.py.

# construct a state space (the default operators are from the paper)
state_space = StateSpace(B, # B = number of blocks in each cell
                         operators=None # whether to use custom operators or the default ones from the paper
                         input_lookback_depth=0, # limit number of combined inputs from previous cell
                         input_lookforward_depth=0, # limit number of combined inputs in same cell
                         )

# create the managers
controller = ControllerManager(state_space, B, K)  # K = number of children networks to train after initial step
manager = NetworkManager(dataset, epochs=max_epochs, batchsize=batchsize)

# For `B` number of trials
  actions = controller.get_actions(K)  # get all the children model to train in this trial

  For each `child` in action
    store reward = manager.get_reward(child) in `rewards` list

  encoder.train(rewards)  # train encoder RNN with a surrogate loss function
  encoder.update()  # build next set of children to train in next trial, and sort them

Evaluating the Controller RNN on unseen model combinations

Once the RNN Controller has been trained above the above approach, we can then score all possible model combinations. This might take a little while due to exponentially growing number of model configurations. This scoring procedure can be done simply in score_architectures.py.

score_architectures.py has a similar setup to the train.py script, but you will notice that B parameter is larger (5) as compare to the B parameter in train.py (3). Any number of B can be provided, which will increase the maximum width of the Cells generated.

In addition, if the search space is small enough, we can pass K (the maximum number of child models we want to compute) to be None. In doing so, all possible child models will be produced and scored by the Controller RNN.

Note: There is an additional parameter INPUT_B. This is the B parameter with which the RNN was trained. Without this, the Controller RNN cannot know the size of the Input Embedding to create, and defaults to the current B. This in turn causes an issue when loading the weights (as the original embedding would have dimensions [B, EMBEDDING_DIM].

python score_architectures.py

Visualizing the results

Finally, we can visualize the results obtained by the Controller RNN and scored by the score_architecture.py script.

We do so by using the rank_architectures.py script, which accepts an argument -f. -f is a path(s) to the csv files that you want to rank and visualize.

Another argument is -sort, which will sort all the possible model combinations according to their predicted scores prior to plotting them. In doing so, if you have mplcursors setup, you can quickly glance at the top performing model architectures and their predicted scores.

There are many ways of calling this script :

  • When you want to just visualize the history of the training procedure : Call it without any arguments.
python rank_architectures.py  # optional -sort
  • When you want to visualize a specific score file (to see the Controller RNN's predictions or actual evaluated model scores from training. These score files correspond to the B parameter in the paper, i.e. the width of the Cell generated.
python rank_architectures.py -f score_2.csv

# Here we assume we want to rank the `score_2.csv` file.
  • When you want to visualize multiple score files at once: pass them one after another. Note: The file names are sorted before display, so it will always show you scores in ascending order.
python rank_architectures.py -f score_5.csv score_3.csv score_2.csv
  • When you want to visualize all score files at once: Pass the file name as score_*.csv. It uses glob internally, so all of its semantics will work here as well.
python rank_architectures.py -f scores_*.csv
  • When you want to visualize not just the scored files, but also the training history - i.e. visualize everything at once: Simply pass * to the -f argument.
python rank_architectures.py -f *.csv

Implementation details

This is a very limited project.

  • It is not a faithful re-implementation of the original paper. There are several small details not incorporated (like bias initialization, actually using the Hc-2 - Hcb-1 values etc)
  • It doesnt have support for skip connections via 'anchor points' etc. (though it may not be that hard to implement it as a special state)
  • Learning rate, number of epochs to train per B_i, regularization strength etc are all random values (which make somewhat sense to me)
  • Single GPU model only. There would need to be a lot of modifications to this for multi GPU training (and I have just 1)

Result

I tried a toy CNN model with 2 CNN cells the a custom search space, train for just 5 epoch of training on CIFAR-10.

All models configuration strings can be ranked using rank_architectures.py script to parse train_history.csv, or you can use score_architectures.py to pseudo-score all combinations of models for all values of B, and then pass these onto rank_architectures.py to approximate the scores that they would obtain.

After sorting using the -sort argument in rank_architectures.py, we get the following of the same data as above.

Requirements

  • Tensorflow-gpu >= 1.12
  • Scikit-learn (most recent available from pip)
  • (Optional) matplotlib - to visualize using rank_architectures.py
  • (Optional) mplcursors - to have annotated models when using rank_architectures.py.

Acknowledgements

Code somewhat inspired by wallarm/nascell-automl