There is something wrong when I run "bash train.sh"

Question

There is something wrong when I run "bash train.sh"

Closed this issue 6 years ago · 4 comments

Hi, I'm very interested in your work on segmentation
also I am new in deep learning
my system is ubuntu 16.04 16GB RAM Nvidia GeForce GTX 1080 Ti
my environment is cuda 8.0 Annoconda4.20 with python3.5 and python2.7(virtural)
i follow your setp ,run "setup_env.sh" to setup environment and then run train.sh
but it not work well. it's very difficult for me to solve it，can you give me a detail python environment and config setting or other guidances.

~/Segmentation/kaggle_carvana_segmentation/asanakoy ~/Segmentation/kaggle_carvana_segmentation
TRAIN SCRATCH
---
==========
FOLD 0
BATCH 1
gacc 4
epochs 250
==========

train_scratch.sh: 行 57:  5912 段错误               (核心已转储) python run_train.py -b=$BATCH -gacc=$gacc -f=$FOLD -nf=7 -fv=1 --lr=0.005 -opt=sgd --decay_step=100 --decay_gamma=0.5 -aug=2 --weight_decay=0.0005 -o="${o_dir}" --epochs=$epochs --no_cudnn

then i run "ternaus/train.sh" also not work

Traceback (most recent call last):
  File "src/train.py", line 196, in <module>
    main()
  File "src/train.py", line 191, in main
    fold=args.fold
  File "/home/ubuntu/Segmentation/kaggle_carvana_segmentation/ternaus/src/utils.py", line 112, in train
    for i, (inputs, targets) in enumerate(tl):
  File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 301, in __iter__
    return DataLoaderIter(self)
  File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 171, in __init__
    self._put_indices()
  File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 210, in _put_indices
    indices = next(self.sample_iter, None)
  File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 115, in __iter__
    for idx in self.sampler:
  File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 50, in __iter__
    return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2033

i also run ''albu/train.sh"

~/Segmentation/kaggle_carvana_segmentation/albu/src ~/Segmentation/kaggle_carvana_segmentation/albu
train.sh: 行 6:  6834 段错误               (核心已转储) PYTHONPATH=$(pwd):$PYTHONPATH python train.py

Answer 1 · 2018-07-03T15:07:16.000Z

How do you run the scripts?
You should use bash. bash setup_env.sh and bash train.sh.
May be you use sh instead of bash.

Answer 2 · 2018-07-04T02:37:23.000Z

Thank you very much
I found the reason for the segmentation fault
my pytorch is not work well with tensorboardX when first import pytorch, so i import tensorboardX first.
the asankoy's solution is run well
but albu's solution is report not enough memory. should i min the bath_size or min the image size ?
the ternaus's solution also the same error, it seem appear wong when call enumerate(tl)

Answer 3 · 2018-07-04T13:15:07.000Z

Try to decrease the batch size.

Answer 4 · 2018-07-04T13:41:37.000Z

Thanks for your help :)