Questions about one-shot LaNAS
Closed this issue · 10 comments
Hi, when I try to replicate the one-shot LaNAS with the source code, I cannot get the curve as in the paper.
Then I check the code and find something strange, could you please help me work it out?
(1) In one-shot LaNAS/LaNAS/Classifier.py
, it seems that the learning rate is too small and the learned linear model is far from the optimal. I think the learning rate should be bigger (such as 0.01), is that true?
(2) When (1) is solved, I find that in function search_samples_under_constraints
in one-shot LaNAS/LaNAS/MCTS.py
, why just a pair of W and b is considered? And when I change the code into considering all Ws and bs, it seems to be hard to retrieve a sample under all constraints. How to solve this issue?
(3) In the search
function in one-shot LaNAS/LaNAS/MCTS.py
, the tree is updated after each sample is evaluated, but in other scripts (like Distributed_LanAS and LaNAS_NASBench101), the tree is updated after evaluating 20/50 samples. How often should the "learning phase" be recondutced (or in other words, how many samples should be retrieved in each "searching phase")?
(4) Which Cp should be chosen to replicate the result of one-shot LaNAS? I find it very big (Cp=10) in the code and it seems a little bit unreasonable.
Hello Jun,
Thanks for your interests,
-
you should try out to see the loss progress: you can start with a small lr and longer epochs.
-
W, b are array of constraints, you need take a look at the following codes:
image -
You can update the tree when you receive a new sample to keep the tree up to date.
-
If the accuracy is in the range of [0, 100], then it is 10; if the range is [0, 1], it is 0.1
Hope that helps.
I admit this version is not well implemented, but here might be a better version if you want to play with it:
https://github.com/facebookresearch/LaMCTS/tree/master/LA-MCTS
Thank you so much. It is great to have another implemented version of LA-MCTS!
My target is to reproduce the oneshot LaNAS results and try to make improvements aseed on your code, and still I have some questions.
(1) I understand the meaning of W and b and how it works, but in the sampling process, rand_arch
is returned once it meet one of the constraints according to your code.
(2) In one-shot LaNAS, init_train
in MCTS is dismissed (while it appears in other two MCTS implementations of LaNAS). Should it be implememented? And how should it be implemented?
(3) Do the hyper-params keep the same among all experiments as mentioned in the paper? I find the tree-height is set to 5 by default in the code.
It is very kind of you if you can provide the code or more details about reproducing the result of one-shot LaNAS
While you're claiming not producing the results, can you please first let me know the results on your side? Please note you are expected to run multiple times to reproduce the figure.
After having the initial results, let's talk about the difference then.
After I fixed the issue of learning rate, I have run the algorithm for 5 trials.
Currently, I set the tree height to 5 and used 200 samples for initialization. The tree will be updated when receiving a new sample, that is, #select=1. The current test_acc - UVs curve is attached below. Different color indicates a different trial.
Two of them have evaluated over 2000 samples and obtained the best test accuracy on supernet of 81.69%(found in sample#2306) and 81.45%(found in sample#1524)
Other three trials are still running. In the latest results they obtained the best test accuracy on supernet of 81.47%(found in sample#38), 80.29%(found in sample#779), and 81.59%(found in sample#705), separately.
I think some of the params I used is incorrect. Could you please help me find out the correct combination of params used in one-shot LaNAS?
Can you add legend to your figure?
Thanks, can you please send the email to wangnan318@gmail.com? These results seem interesting, let me also check back my log here.
I have sent the email.
If you did not receive it, please let me know :)