Official Review

Question

Official Review

micronet-challenge-submissions opened this issue 5 years ago · 11 comments

micronet-challenge-submissions commented 5 years ago

Hi! Thanks for the updates!

Our only outstanding question is about the counting of the mask overhead for sparse weight matrices (1-bit per parameter, including zero valued parameters). Unless I'm missing something, it doesn't look like this is taken into account in your counting script.

Thanks!
Trevor

Answer 1 · 2019-10-31T01:26:39.000Z

Thanks for your feedback.
Through your feedback, we could fix our mistake

As you can see in the (revised)Score_MicroNet.ipynb, we change the method of score.
we believe we resolve the overhead issue you mentioned before.

The main counting method is in the 'Counting', and 'count_hooks.py' is the main file.
Then, maybe this issue could be related to conv counting.

def count_convNd(m, x, y):
    x = x[0]

    kernel_ops = m.weight.size()[2:].numel() * m.in_channels // m.groups
    bias_ops = 1 #if m.bias is not None else 0
    
    total_add_ops =  y.nelement() * (kernel_ops * non_sparsity(m.weight) - 1)  + y.nelement() * bias_ops
    total_mul_ops = y.nelement()  * kernel_ops * non_sparsity(m.weight)
    total_params = m.weight.numel() * non_sparsity(m.weight) + m.weight.shape[0]

    m.total_add_ops += torch.Tensor([total_add_ops])
    m.total_mul_ops += torch.Tensor([total_mul_ops])
    m.total_params += torch.Tensor([total_params])

The only overhead issue could occur in the bias operation. We did not use the bias in conv, but in batchnorm. So, we add the bias counting into conv term. The 'y.nelement() * bias_ops' and ' m.weight.shape[0]' in the above is for bias.

We do not consider sparsity in this bias part during training. In details, during pruning process, we did not prune the bias (including 1-bit parameter). Therefore, we thought there would be no sparsity in 1-bit parameter terms.

Thanks,
Taehyeon kim

Answer 2 · 2019-10-31T01:36:57.000Z

We saw the fixes for the batch norm biases and they look good! I am refering to counting the overhead of storing the convolution and linear layer weights in sparse format. Per the rules, this should be counted as a bitmask, with one bit for each element the weight tensor to indicate whether it is zero or nonzero. This should be added to the total paramter count. Trevor

…

On Wed, Oct 30, 2019 at 6:26 PM Kthyeon ***@***.***> wrote: Thanks for your feedback. Through your feedback, we could fix our mistake As you can see in the (revised)Score_MicroNet.ipynb, we change the method of score. we believe we resolve the overhead issue you mentioned before. The main counting method is in the 'Counting', and 'count_hooks.py' is the main file. Then, maybe this issue could be related to conv counting. def count_convNd(m, x, y): x = x[0] kernel_ops = m.weight.size()[2:].numel() * m.in_channels // m.groups bias_ops = 1 #if m.bias is not None else 0 total_add_ops = y.nelement() * (kernel_ops * non_sparsity(m.weight) - 1) + **y.nelement() * bias_ops** total_mul_ops = y.nelement() * kernel_ops * non_sparsity(m.weight) total_params = m.weight.numel() * non_sparsity(m.weight) + **m.weight.shape[0]** m.total_add_ops += torch.Tensor([total_add_ops]) m.total_mul_ops += torch.Tensor([total_mul_ops]) m.total_params += torch.Tensor([total_params]) The only overhead issue could occur in the bias operation. We did not use the bias in conv, but in batchnorm. So, we add the bias counting into conv term. The bold text in the above is for bias. We do not consider sparsity in this bias part during training. In details, during pruning process, we did not prune the bias (including 1-bit parameter). Therefore, we thought there would be no sparsity in 1-bit parameter terms. Thanks, Taehyeon kim — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMILA63U5LD4JJZFCPGQEKTQRIX5BANCNFSM4JHCJKOA> .

Answer 3 · 2019-10-31T01:49:53.000Z

Thanks for quick reply.

But, we wonder that we've trained the network parameters with FP32.
Because the precision of all parameters is the same,
we decide the bitmask is not needed here.

For freebie, we apply this in the jupyter notebook file.

Then, you mean that
should we also apply some bitmask in the counting file even this is freebie?

Answer 4 · 2019-10-31T02:09:06.000Z

Yes, the bitmask is required for sparse weights. To compute with a sparse tensor, the tensor needs to be stored in a compressed format like compressed sparse row that comes with some storage overhead. We take that into account with the bitmask. Trevor

…

On Wed, Oct 30, 2019 at 6:49 PM Kthyeon ***@***.***> wrote: Thanks for quick reply. But, we wonder that we've trained the network parameters with FP32. Because the precision of all parameters is the same, we decide the bitmask is not needed here. For freebie, we apply this in the jupyter notebook file. Then, you mean that should we also apply some bitmask in the counting file even this is freebie? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMILA67J2MSBC4IBY3KE2U3QRI2UDANCNFSM4JHCJKOA> .

Answer 5 · 2019-10-31T02:31:56.000Z

We don't know if we understand this well, but once we uploaded a new jupyter notebook file for scoring.

In this code, we add this term

def bitmask(net):
    num = 0 
    for module in net.parameters():
        if module.ndimension() != 1:
            num += module.numel()
    #1-bit per parameter
    return num/32

This function is for bitmask, and

def micro_score(net, precision = 'Freebie'):
    input = torch.randn(1, 3, 32, 32).to(net.device)
    addflops, multflops, params = count(net, inputs=(input, ))

    #use fp-16bit
    if precision == 'Freebie':
        multflops = multflops / 2
        params = params / 2
    #add bit-mask
    params += bitmask(net)
    
    score = (params/36500000 + (addflops + multflops)/10490000000)
    #print('Non zero ratio: {}'.format(non_zero_ratio))
    print('Score: {}, flops: {}, params: {}'.format(score, addflops + multflops, params))
    return score

Through bit mask, score function is changed like above.

The new score is 0.0054.

Answer 6 · 2019-10-31T15:45:10.000Z

Looks good! Thanks for the fix! Two quick questions:

Do you still want to submit your "ver2" model? I ran it & checked the score in your revised colab and got .0056, which is an excellent score.
When I run your updated colab I get an error passing "expansion = 3" to the MicroNet class. When I remove this, everything appears to work fine. Just want to make sure this isn't important.

Thanks!
Trevor

Answer 7 · 2019-10-31T15:56:17.000Z

Also, what name would you like your entries posted under when the results are revealed?

Trevor

Answer 8 · 2019-10-31T17:47:07.000Z

Thanks for reply.

First, if ver2 network also could be accepted, we want to submit. But, ver1 has better score. If only one of ver1 and ver2 needs to be submitted, we will submit ver1.
If not, we want to submit both.

Second, expansion isn’t important. Sorry for confusion.

You mean the team name?
Our team name is ‘KAIST AI’
We prefer this name when the results are revealed.

Answer 9 · 2019-10-31T19:25:07.000Z

You can certainly submit both! Sounds good. Thanks again! Trevor

…

On Thu, 31 Oct 2019 at 10:47, Kthyeon ***@***.***> wrote: Thanks for reply. First, if ver2 network also could be accepted, we want to submit. But, ver1 has better score. If only one of ver1 and ver2 needs to be submitted, we will submit ver1. If not, we want to submit both. Second, expansion isn’t important. Sorry for confusion. You mean the team name? Our team name is ‘KAIST AI’ We prefer this name when the results are revealed. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMILA66SCRWIKUEZLWWAQCLQRMKZXANCNFSM4JHCJKOA> .

Answer 10 · 2019-10-31T23:26:10.000Z

If you don't mind me, can you give an approximate current ranking of cifar100?

Taehyeon Kim

Answer 11 · 2019-10-31T23:32:01.000Z

I can't reveal the results just yet, but I can tell you that we are planning to wrap up the scoring process tomorrow and will be releasing the results early next week. Trevor

…

On Thu, 31 Oct 2019 at 16:26, Kthyeon ***@***.***> wrote: If you don't mind me, can you give an approximate current ranking of cifar100? Taehyeon Kim — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMILA65ZWLNJPBPDUMJQPVDQRNSRFANCNFSM4JHCJKOA> .