YanjingLi0202/Q-ViT

why use torch.clip in Q-MLP

Pang-Yatian opened this issue · 2 comments

Hi, Thanks for your great work.

May I ask why add this line to Q_MLP?

        x = torch.clip(x, -10., 10.)

Is there any specific reason? To make training stable? or this trick will improve performance?

def forward(self, x):
x = self.fc1(x)
# print(torch.max(x), torch.min(x))
x = self.act(x)
x = torch.clip(x, -10., 10.)
# print(torch.clip(x, -10., 10.))
x = self.drop1(x)
x = self.fc2(x)
x = self.drop2(x)

We use the torch.clip() here to limit the outliers, but the specific parameters, i.e. "-10., 10." here have not yet been explored sufficiently. We also removed the torch.clip() here conducting experiments, and found that the performance on the Deit-tiny model decreased by about 0.1%

Thanks.