zen_nas

Introduction

Our based code is forked from ZenNAS. We modify the code to make it easier to use, and according to paper Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition, we add some features that ZenNAS does not have.

We mainly made the following changes:

support Horovod, PyTorch distributed training
code refactoring for evolutionary search, speed up searching
repair THCudaTensor sizes too large problem when search latency model

Besides, as some functions can not work correctly under distributed training, we provide a distributed version.

Experimental results

We tested the modified code and verified its correctness. The results are as follows:

We used apex with mixed precision to complete the training within 5 days on 8 tesla V100 GPUs, and the results are consistent with the paper.

	paper model accuracy	distributed training accuracy
ZenNet-0.1ms	77.8%	77.922%
ZenNet-0.8ms	83.0%	83.214%

Note that the same results can be obtained with Horovod, but we took more time to complete the training. So we recommend using apex for distributed training.

Before the code was released, we reproduced the paper algorithm and searched the model according to the paper's conditions. The following table shows the comparison results between the search model and the paper model.

	paper accuracy	searched model accuracy
latency01ms	77.8%	78.622%
latency05ms	82.7%	82.752%
latency12ms	83.6%	83.466%

For proving the effectiveness of the algorithm, we experimented with several different model searches and get the following result. We use a single Tesla V100 GPU to evolve the population 50000 times.

method	model	search time(hours)	model score
ZenNAS	latency01ms	98.4274	126.038
\	latency05ms	22.0189	243.101
\	latency08ms	28.5952	304.323
\	latency12ms	44.6237	375.027
modify-ZenNAS	latency01ms	64.988	134.896
\	latency05ms	20.9895	245.712
\	latency08ms	25.0358	310.629
\	latency12ms	43.239	386.669

Reproduce Paper Experiments

System Requirements

PyTorch >= 1.6, Python >= 3.7
By default, ImageNet dataset is stored under ~/data/imagenet; CIFAR-10/CIFAR-100 is stored under ~/data/pytorch_cifar10 or ~/data/pytorch_cifar100
Pre-trained parameters are cached under ~/.cache/pytorch/checkpoints/zennet_pretrained

Package Requirements

ptflops
tensorboard >= 1.15 (optional)
apex

Pre-trained model download

If you want to evaluate pre-trained models, please go to ZenNAS to download the pre-trained model.

Evaluate pre-trained models on ImageNet and CIFAR-10/100

To evaluate the pre-trained model on ImageNet using GPU 0:

python val.py --fp16 --gpu 0 --arch ${zennet_model_name}

where ${zennet_model_name} should be replaced by a valid ZenNet model name. The complete list of model names can be found in the 'Pre-trained Models' section.

To evaluate the pre-trained model on CIFAR-10 or CIFAR-100 using GPU 0:

python val_cifar.py --dataset cifar10 --gpu 0 --arch ${zennet_model_name}

To create a ZenNet in your python code:

gpu=0
model = ZenNet.get_ZenNet(opt.arch, pretrained=True)
torch.cuda.set_device(gpu)
torch.backends.cudnn.benchmark = True
model = model.cuda(gpu)
model = model.half()
model.eval()

usage

We supply apex and Horovod distributed training scripts, you can modify other original scripts based on these scripts. apex script:

scripts/Zen_NAS_ImageNet_latency0.1ms_train_apex.sh

Horovod script:

scripts/Zen_NAS_ImageNet_latency0.1ms_train.sh

If you want to search model, please notice the choices "--fix_initialize" and "--origin". "--fix_initialize" decides how to initialize population, the algorithm default choice is random initialization. "--origin" determines how the mutation model is generated.
When specified "--origin", the mutated model will be produced using the original method.

Searching on CIFAR-10/100

Searching for CIFAR-10/100 models with budget params < 1M, using different zero-shot proxies:

scripts/Flops_NAS_cifar_params1M.sh
scripts/GradNorm_NAS_cifar_params1M.sh
scripts/NASWOT_NAS_cifar_params1M.sh
scripts/Params_NAS_cifar_params1M.sh
scripts/Random_NAS_cifar_params1M.sh
scripts/Syncflow_NAS_cifar_params1M.sh
scripts/TE_NAS_cifar_params1M.sh
scripts/Zen_NAS_cifar_params1M.sh

Searching on ImageNet

Searching for ImageNet models, with latency budget on NVIDIA V100 from 0.1 ms/image to 1.2 ms/image at batch size 64 FP16:

scripts/Zen_NAS_ImageNet_latency0.1ms.sh
scripts/Zen_NAS_ImageNet_latency0.2ms.sh
scripts/Zen_NAS_ImageNet_latency0.3ms.sh
scripts/Zen_NAS_ImageNet_latency0.5ms.sh
scripts/Zen_NAS_ImageNet_latency0.8ms.sh
scripts/Zen_NAS_ImageNet_latency1.2ms.sh

Searching for ImageNet models, with FLOPs budget from 400M to 800M:

scripts/Zen_NAS_ImageNet_flops400M.sh
scripts/Zen_NAS_ImageNet_flops600M.sh
scripts/Zen_NAS_ImageNet_flops800M.sh

Customize Your Own Search Space and Zero-Shot Proxy

The masternet definition is stored in "Masternet.py". The masternet takes in a structure string and parses it into a PyTorch nn.Module object. The structure string defines the layer structure which is implemented in "PlainNet/*.py" files. For example, in "PlainNet/SuperResK1KXK1.py", we defined SuperResK1K3K1 block, which consists of multiple layers of ResNet blocks. To define your block, e.g. ABC_Block, first, implement "PlainNet/ABC_Block.py". Then in "PlainNet/__init__.py", after the last line, append the following lines to register the new block definition:

from PlainNet import ABC_Block
_all_netblocks_dict_ = ABC_Block.register_netblocks_dict(_all_netblocks_dict_)

After the above registration call, the PlainNet module can parse your customized block from the structure string.

The search space definitions are stored in SearchSpace/*.py. The important function is

gen_search_space(block_list, block_id)

block_list is a list of super-blocks parsed by the masternet. block_id is the index of the block in block_list which will be replaced later by a mutated block This function must return a list of mutated blocks.

Direct specify search space

"PlainNet/AABC_Block.py" has defined the candidate blocks, you can directly specify candidate blocks in the search spaces by passing parameters "--search_space_list". So you have two methods to specify search spaces. Taking ResNet-like search space as an example, you can use "--search_space SearchSpace/search_space_XXBL.py" or "--search_space_list PlainNet/SuperResK1KXK1.py PlainNet/SuperResKXKX.py" to specify search space. Both of them are equivalent.

In scripts, when you choose to use the first method to specify search space, you should also add other two parameters "--fix_initialize" and"--origin", so the algorithm will initialize with a fixed model.

The zero-shot proxies are implemented in "ZeroShotProxy/*.py". The evolutionary algorithm is implemented in "evolution_search.py". "analyze_model.py" prints the FLOPs and model size of the given network. "benchmark_network_latency.py" measures the network inference latency. "train_image_classification.py" implements SGD gradient training and "ts_train_image_classification.py" implements teacher-student distillation.

Open Source

A few files in this repository are modified from the following open-source implementations:

https://github.com/DeepVoltaire/AutoAugment/blob/master/autoaugment.py
https://github.com/VITA-Group/TENAS
https://github.com/SamsungLabs/zero-cost-nas
https://github.com/BayesWatch/nas-without-training
https://github.com/rwightman/gen-efficientnet-pytorch
https://pytorch.org/vision/0.8/_modules/torchvision/models/resnet.html

xmyyzy123/zen_nas