Failed to run the code

Question

Failed to run the code

Closed this issue 4 years ago · 2 comments

jiminHuang commented 4 years ago

Thanks for sharing!

Yet I came across several issues when I try to replicate the experimental results with the code:

pyrogue in dependencies is misspelled of pyrouge
clean.py line 44 post_json should be pdf_json
fname not found in preprocessed data so I remove it from the data loader, is it necessary?

After I managed to fix the aforementioned issues, the code still cannot work. It raised CUDA ERROR.

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)

I wonder if you can check out the error and help me fix it.

Answer 1 · 2021-03-14T16:15:19.000Z

Thank you for the comment, Jimin! I will fix (1) and (2) as soon as possible. For (3), fname is unnecessary so you can just take it out.

As for the CUDA error, would you mind sharing the type of GPU you use, NVIDIA driver & CUDA version? Output of nvidia-smi should give those info.

I suspect that it's either pytorch-cuda incompatibility or training configuration error (e.g. batch size is too big). Let me know if you have any more questions.

Answer 2 · 2021-03-15T03:38:44.000Z

Thanks for your help! I figure out that it is related to huggingface/transformers#10592 and I solved it by downgrading PyTorch to 1.7.1. I suggest that we should specify the required version of PyTorch in dependencies.