Why does this issue occur when the GPU is available and the environment is configured properly?

Question

Why does this issue occur when the GPU is available and the environment is configured properly?

PhilrainV opened this issue a year ago · 1 comments

torch==1.7.1
cuda==11.0
cuda.is_available==True
Start loading the data....
train
Training data loaded!
valid
Validation data loaded!
test
Test data loaded!
Finish loading the data....
[W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in LogdetBackward. Traceback of forward call that caused the error:
File "main.py", line 62, in
solver.train_and_eval()
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 282, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 158, in train
bert_sent, bert_sent_type, bert_sent_mask, y, mem)
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Project\CSCL\paper_test\Main_structure\src\model.py", line 107, in forward
lld_ta, ta_pn, H_ta = self.mi_ta(x=text, y=acoustic, labels=y, mem=mem['ta'])
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Project\CSCL\paper_test\Main_structure\src\modules\encoders.py", line 193, in forward
H = 0.25 * (torch.logdet(sigma_pos) + torch.logdet(sigma_neg))
(function print_stack)
Traceback (most recent call last):
File "main.py", line 62, in
solver.train_and_eval()
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 282, in train_and_eval
train_loss = train(model, optimizer_main, criterion, 1)
File "D:\Project\CSCL\paper_test\Main_structure\src\solver.py", line 189, in train
loss.backward()
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\CODE_env\Anaconda\anaconda3\envs\CSCL\lib\site-packages\torch\autograd_init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cusolver error: 7, when calling cusolverDnCreate(handle)

Answer 1 · 2023-06-22T09:38:30.000Z

I never met this before and it seems a pytorch internal issue. You can approach to pytorch forums for the possible solution.