Model training does not work on CPU

Question

Model training does not work on CPU

saurabhhssaurabh opened this issue 4 years ago · 1 comments

I have cloned code from dev branch and executing following command to fine-tune model on CPU:
python run_ner.py --cache_dir=path_to_cache --data_dir=path_to_data --bert_model=bert-base-uncased --task_name=ner --output_dir=path_to_output --no_cuda --do_train --do_eval --warmup_proportion=0.1

But I am facing the following error:
Traceback (most recent call last):
File "run_ner.py", line 611, in
main()
File "run_ner.py", line 503, in main
loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "run_ner.py", line 43, in forward
logits = self.classifier(sequence_output)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
output = input.matmul(weight.t())
RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I am not getting when I am passing CPU flag, why is it expecting a tensor to be on GPU?

Answer 1 · 2022-03-24T20:54:16.000Z

--no_cuda has an error with the NER task, because the device can still be set to GPU here:

class Ner(BertForTokenClassification):

    def forward(self, input_ids,
                token_type_ids=None,
                attention_mask=None,
                labels=None,
                valid_ids=None,
                attention_mask_label=None):
        # ... skipping to line 47
        valid_output = torch.zeros(batch_size,
                                   max_len,
                                   feat_dim,
                                   dtype=torch.float32,
                                   device='gpu')

I changed the default device arg to cpu when I wasn't using CUDA and everything worked as expected.