Error during training librispeech model, in multi-gpu mode
dh7ahn opened this issue · 1 comments
I've got the following error message from stage 3 (training lm) during training librispeech model in the example directory.
AttributeError: 'numpy.ndarray' object has no attribute 'items'
Checking the corresponding code in data_parallel.py, the variable 'input' is found to be of type 'ndarray', but the code tries to call 'input.items()' though 'input' is not a dictionary type. Is there anything wrong? or how can I fix it?
This error happens when I training with multi gpus. Single-gpu training seems to work without errors.
============================================================================
LM Training stage (stage:3)
0%| | 0/11000320 [00:00<?, ?it/s]Original utterance num: 281241
Removed 0 utterances (threshold)
Original utterance num: 2703
Removed 0 utterances (threshold)
Original utterance num: 2864
Removed 0 utterances (threshold)
Original utterance num: 2620
Removed 0 utterances (threshold)
Original utterance num: 2939
Removed 0 utterances (threshold)
Traceback (most recent call last):
File "/home/ahn/pkgs/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 340, in
save_path = pr.runcall(main)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/home/ahn/pkgs/neural_sp/examples/librispeech/s5/../../../neural_sp/bin/lm/train.py", line 214, in main
loss, hidden, observation = model(ys_train, state=hidden)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ahn/pkgs/neural_sp/tools/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/scl_group_shared/ahn/pkgs/neural_sp/neural_sp/models/data_parallel.py", line 49, in scatter
res = [{k: scatter_map(v, i) for k, v in inputs.items()} for i in range(len(self.device_ids))]
File "/scl_group_shared/ahn/pkgs/neural_sp/neural_sp/models/data_parallel.py", line 49, in
res = [{k: scatter_map(v, i) for k, v in inputs.items()} for i in range(len(self.device_ids))]
AttributeError: 'numpy.ndarray' object has no attribute 'items'
0%| | 0/11000320 [00:00<?, ?it/s]
I recommend using a single GPU for LM training though I will fix the bug.