Error while evaluating model

Question

Error while evaluating model

kishorepv opened this issue 4 years ago · 9 comments

Hi,

I tried evaluating using the provided checkpoint. I get the following error:

root@jetson:/nlp/lite-transformer/lite-transformer# configs/wmt14.en-fr/test.sh /data/nlp/embed200/ 0 valid
Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 32, in main
task = tasks.setup_task(args)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/init.py", line 17, in setup_task
return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/translation.py", line 166, in setup_task
args.source_lang, args.target_lang = data_utils.infer_language_pair(paths[0])
File "/nlp/lite-transformer/lite-transformer/fairseq/data/data_utils.py", line 24, in infer_language_pair
for filename in os.listdir(path):
FileNotFoundError: [Errno 2] No such file or directory: 'data/binary/wmt14_en_fr'
Namespace(ignore_case=False, order=4, ref='/data/nlp/embed200//exp/valid_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/data/nlp/embed200//exp/valid_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

Answer 1 · 2021-02-25T19:48:32.000Z

Do we need the contents of data/binary/wmt14_en_fr directory for evaluation?

Answer 2 · 2021-02-25T23:36:30.000Z

Hi Kishore,
Thank you for asking! We already included the preprocessed binary file of test dataset zipped in our provided checkpoint tar. You could test the checkpoint on the test dataset, by moving the test* and dict* files to data/binary/wmt14_en_fr (mkdir if you do not have it) and calling the test.sh. If you would like to test the checkpoint on the validation set, please run the configs/wmt14.en-fr/prepare.sh to get the preprocessed valid dataset.

Answer 3 · 2021-02-25T23:37:52.000Z

I am closing this issue. If you have any following up questions, please feel free to re-open it.

Answer 4 · 2021-04-08T10:55:15.000Z

I am getting the same issue while testing the model. Even though required test* and dict* files are already in their required place.

Could you(@Michaelvll ) please help me to test the trained checkpoint by resolving the error mentioned in original issue by @kishorepv ?

Answer 5 · 2021-04-08T12:52:20.000Z

Hi @tomshalini, could you please provide the command you used for testing?

Answer 6 · 2021-04-08T12:59:01.000Z

Hi @tomshalini, could you please provide the command you used for testing?

Hello @Michaelvll ,
I am using below command for testing:

configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/checkpoint_best.pt' 0 test

Answer 7 · 2021-04-09T00:28:13.000Z

Hi @tomshalini, could you please provide the command you used for testing?

Hello @Michaelvll ,
I am using below command for testing:

configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/checkpoint_best.pt' 0 test

Could you try to use configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/' 0 test? We will automatically add the checkpoint_best.pt in the test.sh.

Answer 8 · 2021-04-09T04:19:32.000Z

Thank you @Michaelvll for your help. Now, I am getting the below error even though I am running on 2 GPUs.

Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 106, in main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate
encoder_outs = model.forward_encoder(encoder_input)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward
x = layer(x, encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 693, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward
x = branch(q.contiguous(), incremental_state=incremental_state)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward
output = self.linear2(output)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

Answer 9 · 2021-05-08T16:04:18.000Z

Thank you @Michaelvll for your help. Now, I am getting the below error even though I am running on 2 GPUs.

Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 106, in main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate
encoder_outs = model.forward_encoder(encoder_input)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward
x = layer(x, encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 693, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward
x = branch(q.contiguous(), incremental_state=incremental_state)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward
output = self.linear2(output)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

@Michaelvll could you please help me in resolving the above issue?