why use torch.clip in Q-MLP

Hi, Thanks for your great work.

May I ask why add this line to Q_MLP?

        x = torch.clip(x, -10., 10.)

Is there any specific reason? To make training stable? or this trick will improve performance?

Q-ViT/quant_vision_transformer.py

Lines 138 to 147 in 0cee463

    
           def forward(self, x): 
        
               x = self.fc1(x) 
        
               # print(torch.max(x), torch.min(x)) 
        
               x = self.act(x) 
        
               x = torch.clip(x, -10., 10.) 
        
               # print(torch.clip(x, -10., 10.)) 
        
               x = self.drop1(x) 
        
               x = self.fc2(x) 
        
               x = self.drop2(x)

We use the torch.clip() here to limit the outliers, but the specific parameters, i.e. "-10., 10." here have not yet been explored sufficiently. We also removed the torch.clip() here conducting experiments, and found that the performance on the Deit-tiny model decreased by about 0.1%

Thanks.

	def forward(self, x):
	x = self.fc1(x)
	# print(torch.max(x), torch.min(x))
	x = self.act(x)

	x = torch.clip(x, -10., 10.)
	# print(torch.clip(x, -10., 10.))
	x = self.drop1(x)
	x = self.fc2(x)
	x = self.drop2(x)