Possible bug in IntSoftmax

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run cmd '....'
See error

Code sample

Expected behavior

Environment

fairseq Version (e.g., 1.0 or master):
PyTorch Version (e.g., 1.0)
OS (e.g., Linux):
How you installed fairseq (pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

I've been trying to add the Ibert quantization modules to distilbert and ran into this issue.

I-BERT/fairseq/quantization/utils/quant_modules.py

Line 658 in 45cb6da

scaling_factor = 1 / 2 ** self.output_bit

is a float value and is returned as is. I believe that this should be converted to a tensor on the appropriate device before returning something like scaling_factor = torch.tensor([1 / 2 ** self.output_bit], device=exp_int.device).

Please let me know your thoughts on this. Thanks!