Question about code

Question

Question about code

GewelsJI opened this issue 8 months ago · 4 comments

GewelsJI commented 8 months ago

Hey, authors,

Thanks for your open sourcing such a nice work. I have a small question on your code:

Why F.gumbel_softmax during training, but torch.argmin during inference?

Hope to receive your response. :)

Best,
Daniel.

Answer 1 · 2024-05-09T08:55:41.000Z

During training you need to have some random behavior so that when the mask probability is less than 0.5, the mask can still sometimes be True(or 1). During inference it is preferred to have deterministic predictions, so > 0.5 probability produce a True mask, otherwise a False mask.

You can actually use F.gumbel_softmax at inference time as well, with no noticeable impact on accuracy.

Answer 2 · 2024-05-10T00:14:02.000Z

Thanks for your quick reply. That's great.

Best,
Daniel.

Answer 3 · 2024-05-14T05:03:56.000Z

During training you need to have some random behavior so that when the mask probability is less than 0.5, the mask can still sometimes be True(or 1). During inference it is preferred to have deterministic predictions, so > 0.5 probability produce a True mask, otherwise a False mask.

You can actually use F.gumbel_softmax at inference time as well, with no noticeable impact on accuracy.

I want to ask that why using argmin instead of argmax ? I think the mask true should correspond to larger probability, so it should use argmax ?

Hope to get your response, thanks!

Answer 4 · 2024-06-21T05:18:55.000Z

@kaikai23 Hi, I want to ask that in my experiment, if using argmin at inference stage, the number of keep tokens at ViT final layer will be zero, do you have any suggestion ? Hopes to get your reply, thanks !!!