reproducing CIFAR10 results for AutoSlim
RudyChin opened this issue · 8 comments
Hi Jiahui,
Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2).
Could you please provide a detailed hyperparameter you used for it?
I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:
- 0.1 initial learning rate
- linear learning rate decay
- 128 batch size
- 300 epochs of training
- 5e-4 weight decay
- 0.9 nesterov momentum
- no label smoothing
- no weight decay for bias and gamma
To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.
The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.
Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.
Thanks,
Rudy
Hi Jiahui,
Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2).
Could you please provide a detailed hyperparameter you used for it?I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:
- 0.1 initial learning rate
- linear learning rate decay
- 128 batch size
- 300 epochs of training
- 5e-4 weight decay
- 0.9 nesterov momentum
- no label smoothing
- no weight decay for bias and gamma
To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.
The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.
Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.
Thanks,
Rudy
Hi, Rudy, when I greedy slimming the network, I found that the output_channels of SlimmableConv2d didn't change. Did you encounter the same problem?
Hi dada,
I've actually implemented the AutoSlim myself and cross-referenced this code.
I could be wrong but I actually notice some lines of code that I believe to be bugs:
-
In train.py line 559 it tries to init the bn calibration process with input argument being the full model while the definition of
bn_calibration_init
takes the bn module instead of the full model. -
In train.py line 592 it uses the attribute of divisor for each layer but I couldn't locate the definition of
layers[i].divisor
inSlimmableConv2d
Hi, Rudy, Thank you for your reply!
I did encounter some problems when running the code at v3.0.0, when I run
python -m torch.distributed.launch train.py app:apps/autoslim_resnet_train_val.yml
and I have set autoslim_resnet_train_val.yml autoslim: True
But SlimmableConv2d has no definition of us , so in function get_conv_layers, the length of layers is zero.
So it prints Totally 0 layers to slim.
Do I need to replace SlimmableConv2d with USConv2d in the network?
Hi Jiahui,
Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2).
Could you please provide a detailed hyperparameter you used for it?I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:
- 0.1 initial learning rate
- linear learning rate decay
- 128 batch size
- 300 epochs of training
- 5e-4 weight decay
- 0.9 nesterov momentum
- no label smoothing
- no weight decay for bias and gamma
To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.
The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.
Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.
Thanks,
Rudy
Hi, Rudy
Can you show me the code of MobilenetV2 on CIFAR-10?
Hi All,
Sorry for the late reply. While I fully understand ImageNet requires more compute which researchers may not have, the results on CIFAR are usually misleading for Neural Architecture Search especially for efficient neural networks. That's part of the reason why I didn't include the CIFAR config in this code. But I can post the configs here for your reference:
num_hosts_per_job: 1 # number of hosts each job need
num_cpus_per_host: 36 # number of cpus each job need
memory_per_host: 380 # memory requirement each job need
gpu_type: 'nvidia-tesla-p100'
app:
# data
dataset: cifar10
dataset_id: 0
dataset_dir: /home/jiahuiyu/.git/mobile/data
data_transforms: cifar10_basic
data_loader: cifar10_basic
data_loader_workers: 36
drop_last: False
# info
num_classes: 10
test_resize_image_size: 32
image_size: 32
topk: [1]
num_epochs: 100
# optimizer
optimizer: sgd
momentum: 0.9
weight_decay: 0.0001
nesterov: True
# lr
lr: 0.1
lr_scheduler: multistep
multistep_lr_milestones: [30, 60, 90]
multistep_lr_gamma: 0.1
# model profiling
profiling: [gpu]
# pretrain, resume, test_only
test_only: False
# seed
random_seed: 1995
# model
reset_parameters: True
# app defaults
optimizer: mobile_sgd
num_gpus_per_host: 8
batch_size_per_gpu: 128
distributed: True
distributed_all_reduce: True
num_epochs: 250
slimmable_training: True
calibrate_bn: True
inplace_distill: True
cumulative_bn_stats: True
bn_cal_batch_num: 32 # effective batch num is batch_num/gpu_num
num_sample_training: 4
lr: 0.5
lr_scheduler: linear_decaying
lr_warmup: True
lr_warmup_epochs: 5
run:
shell_command: "'python -m torch.distributed.launch --nproc_per_node={} --nnodes={} --node_rank={} --master_addr={} --master_port=2234 train.py'.format(nproc_per_node, nnodes, rank, master_addr)"
jobs:
# - name: mobilenet_v1_0.2_1.1_nonuniform_50epochs_dynamic_divisor12
# app_override:
# model: models.us_mobilenet_v1
# width_mult_list_test: [0.2, 1.1]
# width_mult_range: [0.2, 1.1]
# universally_slimmable_training: True
# nonuniform: True
# num_epochs: 50
# dataset: cifar10_val5k
# inplace_distill: True
# dynamic_divisor: 12
# nonuniform_diff_seed: True
# # lr: 1.5
# # batch_size_per_gpu: 48
# # num_hosts_per_job: 8
# lr: 0.125
# batch_size_per_gpu: 32
# num_hosts_per_job: 1
# data_loader_workers: 4
# # num_gpus_per_host: 1
# - name: mobilenet_v2_0.15_1.5_nonuniform_50epochs_dynamic_divisor12
# app_override:
# model: models.us_mobilenet_v2
# width_mult_list_test: [0.15, 1.5]
# width_mult_range: [0.15, 1.5]
# universally_slimmable_training: True
# nonuniform: True
# num_epochs: 50
# dataset: cifar10_val5k
# inplace_distill: True
# dynamic_divisor: 12
# nonuniform_diff_seed: True
# lr: 0.5
# batch_size_per_gpu: 128
# num_hosts_per_job: 1
# data_loader_workers: 4
# - name: mnasnet_0.15_1.5_nonuniform_50epochs_dynamic_divisor12_ngc
# app_override:
# model: models.us_mnasnet
# width_mult_list_test: [0.15, 1.5]
# width_mult_range: [0.15, 1.5]
# universally_slimmable_training: True
# nonuniform: True
# batch_size_per_gpu: 32
# num_epochs: 50
# dataset: imagenet1k_val50k_lmdb
# inplace_distill: True
# dynamic_divisor: 12
# nonuniform_diff_seed: True
# # lr: 2.0
# # batch_size_per_gpu: 64
# # lr: 1.
# # num_hosts_per_job: 8
# lr: 0.125
# num_hosts_per_job: 1
# dataset_dir: /data/imagenet
# data_loader_workers: 4
Please also note that the latest version is released under branch v3.0.0, instead of master branch.
(I am keeping this issue open and marking it as good first issue)
Hi, Rudy, Thank you for your reply!
I did encounter some problems when running the code at v3.0.0, when I run
python -m torch.distributed.launch train.py app:apps/autoslim_resnet_train_val.yml
and I have set autoslim_resnet_train_val.yml autoslim: True
But SlimmableConv2d has no definition of us , so in function get_conv_layers, the length of layers is zero.
So it prints Totally 0 layers to slim.
Do I need to replace SlimmableConv2d with USConv2d in the network?
Hi, I also encountered the same problem, how did you solve it?