memray/seq2seq-keyphrase-pytorch

Getting RuntimeError: pinned memory requires CUDA error in CPU-Python 3-Ubuntu 16.04.3 64 bit

Closed this issue · 6 comments

I am using Ubuntu 16.04.3 64 bit CPU Python - 3.

I have tried the execution as below:

cd /home/user/seq2seq-keyphrase-pytorch

  1. preprocess.py

I am having traindata.training,json,traindata.testing.json and traindata.validating.json.
The preprocess went successfully.
#python3 preprocess.py -dataset traindata -save_data PreprocessedData

  1. train.py

python3 train.py -words_min_frequency 2 -max_src_seq_length 10000 -min_src_seq_length 20 -max_trg_seq_length 4000 -min_trg_seq_length 2 -encoder_type rnn -decoder_type rnn -enc_layers 3 -dec_layers 3 -data data/PreprocessedData/traindata -save_model Iter001Model -vocab data/PreprocessedData/PreprocessedData.vocab.pt

While executing the train.py , I am getting below error:


INFO:root:====================== Checking GPU Availability =========================
07/24/2018 17:48:45 [INFO] train: ====================== Checking GPU Availability =========================
INFO:root:Running on CPU!
07/24/2018 17:48:45 [INFO] train: Running on CPU!
INFO:root:====================== Start Training =========================
07/24/2018 17:48:45 [INFO] train: ====================== Start Training =========================
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=70 error=35 : CUDA driver version is insufficient for CUDA runtime version

ERROR:root:message
Traceback (most recent call last):
File "train.py", line 640, in main
train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, test_data_loader, opt)
File "train.py", line 321, in train_model
for batch_i, batch in enumerate(train_data_loader):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 223, in next
return self._process_next_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 243, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 77, in _pin_memory_loop
batch = pin_memory_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 139, in pin_memory_batch
return batch.pin_memory()
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:70

07/24/2018 17:48:45 [ERROR] train: message
Traceback (most recent call last):
File "train.py", line 640, in main
train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, test_data_loader, opt)
File "train.py", line 321, in train_model
for batch_i, batch in enumerate(train_data_loader):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 223, in next
return self._process_next_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 243, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 77, in _pin_memory_loop
batch = pin_memory_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 139, in pin_memory_batch
return batch.pin_memory()
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:70

Segmentation fault (core dumped)`


Please guide/suggest any steps to resolve this error.

It says

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:70

so might be a CUDA version issue?

Thanks for quick update.

I have removed torch from my system and then reinstalled torch as mentioned in https://pytorch.org.

pip3 install http://download.pytorch.org/whl/cpu/torch-0.4.0-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision

After that, I have tried to execute the program as above

python3 train.py -words_min_frequency 2 -max_src_seq_length 10000 -min_src_seq_length 20 -max_trg_seq_length 4000 -min_trg_seq_length 2 -encoder_type rnn -decoder_type rnn -enc_layers 3 -dec_layers 3 -data data/PreprocessedData/traindata -save_model Iter001Model -vocab data/PreprocessedData/PreprocessedData.vocab.pt

Now I am getting different error.


'''
INFO:root:====================== Checking GPU Availability =========================
07/25/2018 11:25:44 [INFO] train: ====================== Checking GPU Availability =========================
INFO:root:Running on CPU!
07/25/2018 11:25:44 [INFO] train: Running on CPU!
INFO:root:====================== Start Training =========================
07/25/2018 11:25:44 [INFO] train: ====================== Start Training =========================
ERROR:root:message
Traceback (most recent call last):
File "train.py", line 640, in main
train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, test_data_loader, opt)
File "train.py", line 321, in train_model
for batch_i, batch in enumerate(train_data_loader):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 223, in next
return self._process_next_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 243, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 77, in _pin_memory_loop
batch = pin_memory_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 139, in pin_memory_batch
return batch.pin_memory()
RuntimeError: pinned memory requires CUDA

07/25/2018 11:25:44 [ERROR] train: message
Traceback (most recent call last):
File "train.py", line 640, in main
train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, test_data_loader, opt)
File "train.py", line 321, in train_model
for batch_i, batch in enumerate(train_data_loader):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 223, in next
return self._process_next_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 243, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 77, in _pin_memory_loop
batch = pin_memory_batch(batch)
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in pin_memory_batch
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 145, in
return [pin_memory_batch(sample) for sample in batch]
File "/home/user/seq2seq-keyphrase-pytorch/pykp/dataloader.py", line 139, in pin_memory_batch
return batch.pin_memory()
RuntimeError: pinned memory requires CUDA
'''


Please guide/suggest any steps to resolve this error.

I'm not super familiar with the data loader part so @memray please correct me if I'm wrong, to me

-max_src_seq_length 10000 -max_trg_seq_length 4000

the above two arguments look weird, given the default values are 300 and 8, respectively, so is that possible that CUDA fails to allocate so much memory?

I think @xingdi-eric-yuan might be right. Is your text very long? What is your batch size? Also it looks weird to me that you are running on CPU but it reports a CUDA error.

Yes, the text I am training is long. I am not using any batch size.
" Also it looks weird to me that you are running on CPU but it reports a CUDA error. "

  • I too weird on this. For time being, I have added a line in pykp.Dataloader.

`class KeyphraseDataLoader(object):
def init(self, dataset, max_batch_example=5, max_batch_pair=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=default_collate, pin_memory=False, drop_last=False):
self.dataset = dataset
# used for generating one2many batches
self.num_trgs = [len(e['trg']) for e in dataset.examples]
self.batch_size = max_batch_pair
self.max_example_number = max_batch_example
self.num_workers = num_workers
self.collate_fn = collate_fn

    pin_memory              = False 

    self.pin_memory         = pin_memory
    self.drop_last          = drop_last

`

Added the below line.
pin_memory = False
Then It works somehow, Dont know how to proceed testing and validation.
Thanks for quick suggestions.

Cool. Thanks for letting us know. Might be due to recent updates on API? I'll put it into my code as well.