(huggingface) The output of IBERT is float. Am I doing wrong?

Question

(huggingface) The output of IBERT is float. Am I doing wrong?

kyoungrok0517 opened this issue 3 years ago · 0 comments

❓ Questions and Help

What is your question?

I'm using the huggingface's implementation. Even though I set the quant_mode=True, I see the output of IBert is in float32 type. Am I using the model wrong, or is it expected?

Code

self.bert = AutoModel.from_pretrained(
    base_model, quant_mode=quant_mode, add_pooling_layer=False
)

...


def forward(
        self,
        input_ids: Tensor,
        attention_mask: Tensor,
        k: int = None,
        return_layers: List[int] = None,
        return_orig: bool = False,
    ):
        bert_out = self.bert(
            input_ids,
            attention_mask=attention_mask,
            output_hidden_states=True,
            return_dict=True,
        )

        # the output dtype is float32!
        print(bert_out.hidden_states[0])

What's your environment?

PyTorch Version: 1.7.1
OS (e.g., Linux): Ubuntu 20
How you installed fairseq (pip, source): No
Python version: 3.8.5
CUDA/cuDNN version: 11.0
GPU models and configuration: RTX 3090