Adamdad/DeRy

Whether loading pre-trained weight when adopting NASWOT?

pprp opened this issue · 2 comments

pprp commented

Thank you for your great job and it make sense to me.

I am curious about the role of NASWOT in DeRy. Generally, NASWOT score can be obtained from a randomly initialized networks, and can inflect the expressive ability of a neural network structure.

However, I am unsure whether the model in the code you provided

new_value = indicator.get_score(model)[args.zero_proxy]
has loaded pre-trained weights or just randomly initialized weights.

If the model is randomly initialized, I am curious whether you have tried loading pre-trained weights and comparing the results. Are there any drawbacks to using pre-trained weights that you have encountered in your experiments?"

Dear @pprp,

Thank you for your question. In response, I would like to clarify that when computing the NASWOT score, we do load pre-trained weights. However, in my experiments, I have found that there is very little difference in the NASWOT scores when using either pre-trained weights or randomly initialized weights (with a difference of less than 1). This suggests that the NASWOT score is more closely correlated with the network architecture itself rather than the weights.

This is an interesting and relatively unexplored phenomenon, and there may be mathematical properties of the NASWOT score that relate to coding theory. The score essentially measures the information capacity of a network, which can be expressed as the number of feature binary partitions (proportional to the power of entropy).

While I have not yet delved deeper into this phenomenon, I hope that my answer has addressed your question. If you are interested, you can replicate my experiments using my code, and I believe that you will observe similar results.

Best regards,
Xingyi Yang

pprp commented

@Adamdad Thanks for you clarification, I will try it later.