Issue with index Embedding layer
lorenzoFabbri opened this issue · 1 comments
I'm trying to use OpenChem for a classification task. My dataset is basically Tox21 with one label. I'm using a machine with a single GPU.
I simply adapted the provided script for Tox21 but I keep getting the following error:
/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [67,0,0], thread: [31,0,0] Assertion
srcIndex < srcSelectDimSize failed.
File "/.../OpenChem/openchem/modules/encoders/rnn_encoder.py", line 90, in init_hidden requires_grad=True).cuda() RuntimeError: CUDA error: device-side assert triggered
Some SMILES were longer than 1024, so I removed them and now the longest one is less than 300 characters. Still, I keep getting the very same error.
I read online that these bugs are easier to find when using a CPU. I thus tried to set use_cuda=False
in the configuration file. Nonetheless, it still tries to copy everything to the GPU since the error points to the same line (line 90). I then tried to set --use_cuda="False"
form the command line but I keep getting the following error:
ValueError: use_cuda has to be of type <class 'bool'>
.
I thus set use_cuda
to False directly in openchem_encoder.py
but then I get RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
. I guess it's using CUDA somewhere else, though.
Am I missing something? The Tox21 scripts seem to be working correctly (except for lots of warnings). Thanks.
I was suggested to increase the size of num_embeddings
from train_dataset.num_tokens
to train_dataset.num_tokens+2
since the maximum of the input tensor was larger than the embedding size. I still do not know why that happened.
I then faced some more issues. I was wondering whether you tested the library on a simple classification task. Since I first had to reshape my labels with reshape(-1, 1)
and then I had to modify the cast_inputs
module in the Smiles2Label
by replacing batch_labels = batch_labels.long()
with batch_labels = torch.flatten(batch_labels.long())
.