IndexError: index 76 is out of bounds for axis 1 with size 3
Opened this issue · 1 comments
Hello,
I am currently trying to automate parts of this project and I am running into difficulties during the training phase using CPU mode, which throws an IndexError
and appears to hang the entire training. I am using a very small dataset from the mass_buildings
set, i.e. I am using 8 training images and 2 validation images. The purpose is only to test and not to have accurate results at the moment. Below is the state of the installation and steps I am using:
System:
uname -a
Linux user-VirtualBox 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Python (w/o Anaconda):
$ python -V
Python 3.5.2
Python modules:
user@user-VirtualBox:~/Source/ssai-cnn$ pip3 freeze
...
chainer==1.5.0.2
...
Cython==0.23.4
...
h5py==2.7.1
...
lmdb==0.87
...
matplotlib==2.1.1
...
numpy==1.10.1
onboard==1.2.0
opencv-python==3.1.0.3
...
six==1.10.0
tqdm==4.19.5
...
Additionally, Boost 1.59.0
and OpenCV 3.0.0
have been build and installed from source and both installs appears successful. The utils
is also built successfully.
I have downloaded only a small subset of the mass_buildings
dataset:
# ls -R ./data/mass_buildings/train/
./data/mass_buildings/train/:
map sat
./data/mass_buildings/train/map:
22678915_15.tif 22678930_15.tif 22678945_15.tif 22678960_15.tif
./data/mass_buildings/train/sat:
22678915_15.tiff 22678930_15.tiff 22678945_15.tiff 22678960_15.tiff
Below is the output obtained by running the shells/create_datasets.sh
script, modified only to build the mass_buildings
data:
patch size: 92 24 16
n_all_files: 1
divide:0.6727173328399658
0 / 1 n_patches: 7744
patches: 7744
patch size: 92 24 16
n_all_files: 1
divide:0.6314394474029541
0 / 1 n_patches: 7744
patches: 7744
patch size: 92 24 16
n_all_files: 4
divide:0.6260504722595215
0 / 4 n_patches: 7744
divide:0.667414665222168
1 / 4 n_patches: 15488
divide:0.628319263458252
2 / 4 n_patches: 23232
divide:0.6634025573730469
3 / 4 n_patches: 30976
patches: 30976
0.03437542915344238 sec (128, 3, 64, 64) (128, 16, 16)
Then the training script is initiated using the following command:
user@user-VirtualBox:~/Source/ssai-cnn$ CHAINER_TYPE_CHECK=0 CHAINER_SEED=$1 \
> nohup python ./scripts/train.py \
> --seed 0 \
> --gpu -1 \
> --model ./models/MnihCNN_multi.py \
> --train_orthokill _db data/mass_buildings/lmdb/train_sat \
> --train_label_db data/mass_buildings/lmdb/train_map \
> --valid_ortho_db data/mass_buildings/lmdb/valid_sat \
> --valid_label_db data/mass_buildings/lmdb/valid_map \
> --dataset_size 1.0 \
> --epoch 1
As you can see above, I've been using only 8 images and a single epoch. I left the entire process run an entire night and never completed. Hence the reason I believe the process simply hanged. Using nohup
also does not complete. When forcefully stopped using Ctrl-C
, I'm obtaining the following message:
# cat nohup.out
Traceback (most recent call last):
File "./scripts/train.py", line 313, in <module>
model, optimizer = one_epoch(args, model, optimizer, epoch, True)
File "./scripts/train.py", line 265, in one_epoch
optimizer.update(model, x, t)
File "/usr/local/lib/python3.5/dist-packages/chainer/optimizer.py", line 377, in update
loss = lossfun(*args, **kwds)
File "./models/MnihCNN_multi.py", line 31, in __call__
self.loss = F.softmax_cross_entropy(h, t, normalize=False)
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 152, in softmax_cross_entropy
return SoftmaxCrossEntropy(use_cudnn, normalize)(x, t)
File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 105, in __call__
outputs = self.forward(in_data)
File "/usr/local/lib/python3.5/dist-packages/chainer/function.py", line 183, in forward
return self.forward_cpu(inputs)
File "/usr/local/lib/python3.5/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 39, in forward_cpu
p = yd[six.moves.range(t.size), numpy.maximum(t.flat, 0)]
IndexError: index 76 is out of bounds for axis 1 with size 3
This is the only components that fails at this moment. I've tested the prediction and evaluation phases using the pre-trained data and both seems to complete successfully. Any assistance on how I could use the training script using custom datasets would be appreciated.
Thank you
@InfectedPacket Thank you for trying my code. If you don't change anything in the code, the training successfully run?