idstcv/ZenNAS

Deeper and wider network has higher accuracy?

buaabai opened this issue · 6 comments

From Figure 2. in the paper, it can be seen that deeper and wider network has higher Zen-Score. And Zen-Score positively correlates with model accuracy. So Deeper and wider network has higher accuracy, which is a well-known principle. Then whats' the meaning of Zen-Score?

Thank you for your answer. I tested Zen-Score on CIFAR10 using RegNet search space (3 stages, blocks number, width, bottleneck ratio and groups can be searched). I found that deeper and wider network has higher Zen-Score. 128 models have been sampled from the search space, and 10 were chosen to verify the effectiveness of Zen-Score. But under same training settings (SGD lr 0.1, cosine decay, 120 epochs), models with higher Zen-Score can not get higher accuracy. 10 models with totally different Zen-Score have nearly same accuracy. Small models with lower Zen-Score can perform better than deeper and wider ones. Is there any suggestion for the verification of Zen-Score? How did you modify the ResNet50 to prove the effectiveness of Zen-Score?

Dear Buaabai,

Thank you for your feedbacks! Higher zen-score only suggests larger flexibility in fitting functions. However, the final accuracy depends on lots of other factors, such as the number of training data and the optimizer. For example, a very deep but narrow network has large zen-score (which is expected since the number of linear regions of this network is extremely large) but we all know it might not achieve high accuracy after training. This is mostly because the optimizer we use is not able to train such deep models and the size of training is not infinite actually. From generalization error bound, when training data is very limited (compare to model capacity), the testing error will be dominated by the model capacity therefore Zen-score cannot align with accuracy well.

In practice, it is suggested to specify a reasonable maximal depth of the model then maximize the Zen-score. This often gives you good models. Another way to avoid zen-score generating over-deep models is to consider the NTK score during search. NTK measures the training difficulty of models. However, we did not discuss this approach in the paper and you are highly appreciated if you wish to share your results.

By the way, 120 epochs training on CIFAR is not enough in most cases. We suggest at least 1440 epochs of training to ensure the stable convergence. Deeper and narrower models need more training epochs. If you train the same model several times, you will find that the variance of the accuracy of 120 epochs is very large. And also, two models are considered (significantly) different if their accuracies differ more than 2%.

"How did you modify the ResNet50 to prove the effectiveness of Zen-Score"

We select a small network as initial structure and then use EA to maximize the Zen-score. The budget is given around 1M~2M params scale. We randomly sample structures during the EA and train them to check the accuracy. The accuracy increases along with Zen-score.

Thank you for your comprehensive answer. I will do more experiments. Thank you very much!