Question about code
GewelsJI opened this issue · 4 comments
Hey, authors,
Thanks for your open sourcing such a nice work. I have a small question on your code:
Why F.gumbel_softmax during training, but torch.argmin during inference?
Hope to receive your response. :)
Best,
Daniel.
During training you need to have some random behavior so that when the mask probability is less than 0.5, the mask can still sometimes be True(or 1). During inference it is preferred to have deterministic predictions, so > 0.5 probability produce a True mask, otherwise a False mask.
You can actually use F.gumbel_softmax at inference time as well, with no noticeable impact on accuracy.
Thanks for your quick reply. That's great.
Best,
Daniel.
During training you need to have some random behavior so that when the mask probability is less than 0.5, the mask can still sometimes be True(or 1). During inference it is preferred to have deterministic predictions, so > 0.5 probability produce a True mask, otherwise a False mask.
You can actually use F.gumbel_softmax at inference time as well, with no noticeable impact on accuracy.
I want to ask that why using argmin instead of argmax ? I think the mask true should correspond to larger probability, so it should use argmax ?
Hope to get your response, thanks!