ShomyLiu/Neu-Review-Rec

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Albertwindows opened this issue · 2 comments

命令:python3 main.py train --model=NARRE --num_fea=2 --output=lfm --dataset=men
错误信息:
load npy from dist...


user config:
vocab_size => 50002
word_dim => 300
r_max_len => 202
u_max_r => 13
i_max_r => 24
train_data_size => 163266
test_data_size => 11613
val_data_size => 11613
user_num => 34136
item_num => 76418
batch_size => 1
print_step => 100


loading train data
loading val data
train data: 163266; test data: 11613
start training....
2020-11-26 22:25:11 Epoch 0...
Traceback (most recent call last):
File "main.py", line 212, in
fire.Fire()
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/fire/core.py", line 468, in _Fire
target=component.name)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "main.py", line 85, in train
output = model(train_datas)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wzhuo/web/src/Neu-Review-Rec/framework/models.py", line 38, in forward
user_feature, item_feature = self.net(datas)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wzhuo/web/src/Neu-Review-Rec/models/narre.py", line 23, in forward
u_fea = self.user_net(user_reviews, uids, user_item2id)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wzhuo/web/src/Neu-Review-Rec/models/narre.py", line 65, in forward
fea = F.relu(self.cnn(reviews.unsqueeze(1))).squeeze(3) # .permute(0, 2, 1)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 419, in forward
return self._conv_forward(input, self.weight)
File "/home/wzhuo/web/env3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 416, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [11,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [19,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:218: indexSelectSmallIndex: block: [0,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.

请问怎么解决?

你好, 如果是用的自己的数据集的话, 请对应加入和修改 config/config.py 的字段。
另外看输出日志,batch_size=1 应该会很慢的。

好的谢谢,已经解决