changlin31/BossNAS

How to select architectures from the trained supernet?

ranery opened this issue · 7 comments

Hi, thanks for your great work!

I tried using your given searching code for training the supernet. But I did not figure out how to search the potential architectures from such a supernet?

I guess the validation hook serves as such functions, but I did not find the saved path information after training one epoch. Are there other files I need to explore or just waiting for more epochs to be trained?

Could you advise me about that, thanks in advance for your time and help!

Best,
Haoran

Hi Haoran @ranery

Evaluation, BatchNorm recalibration and architectures ranking are automatically performed after every supernet training stage (default for 6 epochs). The best architecture of current stage is the top-1 encoding saved in the path rank file.

For example, our saved encodings are provided here: https://github.com/changlin31/BossNAS/tree/main/ranking_mbconv/path_rank
The best architecture's encoding is the combination of the first ones in each file, i.e. 11 0110 0111 1100 0110. (0&1 represent the first & the second candidate operation)

Got it, thanks for your response!

I encountered another problem when resuming the checkpoints for continually training.

RuntimeError: Given groups=1, weight of size 1024 512 3 3, expected input[64, 1024, 28, 28] to have 512 channels, but got 1024 channels instead

It occurs when it executes at:

if fmap_size > self.fmap_size:
  residual = self.downsample_d(x)
  x = self.conv1_d(x)
  x = self.peg_d(x)
  x = self.bn1_d(x)

Do you have any thoughts regarding this issue?

I need more informations to find out the problem. But I guess this issue happens because you resumed from a different stage.

Which code are you running? Are you searching on MBConv search space or HyTra search space? Are you resuming from the first stage or not?

You can try modifying the start_block here to match with your checkpoint.

Thanks for your timely response, I figured this issue out! But I find that my searched block is quite regular:
[1, 1, 1, 1] → [1, 1, 1, 1] → [0, 0, 0, 0] → [0, 0, 0, 0]

I am wondering whether do you have an explanation for this structure since it is quite different from your retrained ones.

  • Why there are always four modules in one block?
  • Why first eight modules are all ResConv while the last eight modules are all ResAtt?

Could you elaborate more about the rationale behind your searching methods? Specifically, how do you identify whether to conduct downsampling at current stages?

  • Why there are always four modules in one block?

This block partition is only used for supernet training. Different partition does not change the architecture. Your searched architecture is actually [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0].

  • Why first eight modules are all ResConv while the last eight modules are all ResAtt?

This case is actually very likely to happen, as conv is better than attn in early stages and attn is better than conv in later stages.

  • How do you identify whether to conduct downsampling at current stages?

Your searched architecture is actually downsampled to the smallest scale from the beginning. We have 6 candidate operations in HyTra Search Space:

0: ResAttn @ 7x7
1: ResConv @ 7x7
2: ResAttn @ 14x14
3: ResConv @ 14x14
4: ResConv @ 28x28
5: ResConv @ 56x56

The number after @ are resolution.

We have now add the restriction to avoid the case of downsampling across multiple scale. Hope this can solve the problem.

Thanks for your clarification!