facebookresearch/LaMCTS

Questions about one-shot LaNAS

Closed this issue · 10 comments

Hi, when I try to replicate the one-shot LaNAS with the source code, I cannot get the curve as in the paper.
image
Then I check the code and find something strange, could you please help me work it out?
(1) In one-shot LaNAS/LaNAS/Classifier.py, it seems that the learning rate is too small and the learned linear model is far from the optimal. I think the learning rate should be bigger (such as 0.01), is that true?
image
(2) When (1) is solved, I find that in function search_samples_under_constraints in one-shot LaNAS/LaNAS/MCTS.py, why just a pair of W and b is considered? And when I change the code into considering all Ws and bs, it seems to be hard to retrieve a sample under all constraints. How to solve this issue?
image
(3) In the search function in one-shot LaNAS/LaNAS/MCTS.py, the tree is updated after each sample is evaluated, but in other scripts (like Distributed_LanAS and LaNAS_NASBench101), the tree is updated after evaluating 20/50 samples. How often should the "learning phase" be recondutced (or in other words, how many samples should be retrieved in each "searching phase")?
(4) Which Cp should be chosen to replicate the result of one-shot LaNAS? I find it very big (Cp=10) in the code and it seems a little bit unreasonable.

Hello Jun,

Thanks for your interests,

  1. you should try out to see the loss progress: you can start with a small lr and longer epochs.

  2. W, b are array of constraints, you need take a look at the following codes:
    image

  3. You can update the tree when you receive a new sample to keep the tree up to date.

  4. If the accuracy is in the range of [0, 100], then it is 10; if the range is [0, 1], it is 0.1

Hope that helps.

I admit this version is not well implemented, but here might be a better version if you want to play with it:

https://github.com/facebookresearch/LaMCTS/tree/master/LA-MCTS

Thank you so much. It is great to have another implemented version of LA-MCTS!
My target is to reproduce the oneshot LaNAS results and try to make improvements aseed on your code, and still I have some questions.
(1) I understand the meaning of W and b and how it works, but in the sampling process, rand_arch is returned once it meet one of the constraints according to your code.
(2) In one-shot LaNAS, init_train in MCTS is dismissed (while it appears in other two MCTS implementations of LaNAS). Should it be implememented? And how should it be implemented?
(3) Do the hyper-params keep the same among all experiments as mentioned in the paper? I find the tree-height is set to 5 by default in the code.
image
It is very kind of you if you can provide the code or more details about reproducing the result of one-shot LaNAS

While you're claiming not producing the results, can you please first let me know the results on your side? Please note you are expected to run multiple times to reproduce the figure.

After having the initial results, let's talk about the difference then.

After I fixed the issue of learning rate, I have run the algorithm for 5 trials.
Currently, I set the tree height to 5 and used 200 samples for initialization. The tree will be updated when receiving a new sample, that is, #select=1. The current test_acc - UVs curve is attached below. Different color indicates a different trial.
image
Two of them have evaluated over 2000 samples and obtained the best test accuracy on supernet of 81.69%(found in sample#2306) and 81.45%(found in sample#1524)
Other three trials are still running. In the latest results they obtained the best test accuracy on supernet of 81.47%(found in sample#38), 80.29%(found in sample#779), and 81.59%(found in sample#705), separately.
I think some of the params I used is incorrect. Could you please help me find out the correct combination of params used in one-shot LaNAS?

Can you add legend to your figure?

image
Lines of different color just indicate different trials. The figure is a little different from the above one just because I collect the results of more samples.
Besides, I have sent you an email in which I described my problems more detailedly:)
Thanks for your patiency!

Thanks, can you please send the email to wangnan318@gmail.com? These results seem interesting, let me also check back my log here.

I have sent the email.
If you did not receive it, please let me know :)

Closed for solving the problem via email.

image