Choice of datapoints to which RSC is applied
AhmedFrikha opened this issue · 2 comments
In the paper it is mentioned that RSC is applied to a random subset of the current batch. But it seems that from lines 126 to 146 in resnet.py, something more sophisticated is performed.
a) Can you explain what it done in that part of the code, especially the meaning of variables used in lines 142 to 146?
b) Why is the mask a variable that requires grad ? (line 149 in resnet.py)
I got an answer to my question (a) from this issue #10.
But I still don't understand, why you turn the mask into a trainable variable ?
We mention in the paper that applying RSC to the top percentage of batch samples based on cross-entropy loss is slightly better than randomness.
It doesn’t matter if you turn the mask into a trainable variable because it is an extra input.