There is something wrong when I run "bash train.sh"
Closed this issue · 4 comments
Hi, I'm very interested in your work on segmentation
also I am new in deep learning
my system is ubuntu 16.04 16GB RAM Nvidia GeForce GTX 1080 Ti
my environment is cuda 8.0 Annoconda4.20 with python3.5 and python2.7(virtural)
i follow your setp ,run "setup_env.sh" to setup environment and then run train.sh
but it not work well. it's very difficult for me to solve it,can you give me a detail python environment and config setting or other guidances.
~/Segmentation/kaggle_carvana_segmentation/asanakoy ~/Segmentation/kaggle_carvana_segmentation
TRAIN SCRATCH
---
==========
FOLD 0
BATCH 1
gacc 4
epochs 250
==========
train_scratch.sh: 行 57: 5912 段错误 (核心已转储) python run_train.py -b=$BATCH -gacc=$gacc -f=$FOLD -nf=7 -fv=1 --lr=0.005 -opt=sgd --decay_step=100 --decay_gamma=0.5 -aug=2 --weight_decay=0.0005 -o="${o_dir}" --epochs=$epochs --no_cudnn
then i run "ternaus/train.sh" also not work
Traceback (most recent call last):
File "src/train.py", line 196, in <module>
main()
File "src/train.py", line 191, in main
fold=args.fold
File "/home/ubuntu/Segmentation/kaggle_carvana_segmentation/ternaus/src/utils.py", line 112, in train
for i, (inputs, targets) in enumerate(tl):
File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 301, in __iter__
return DataLoaderIter(self)
File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 171, in __init__
self._put_indices()
File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 210, in _put_indices
indices = next(self.sample_iter, None)
File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 115, in __iter__
for idx in self.sampler:
File "/home/ubuntu/anaconda3/envs/py35_ternaus/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 50, in __iter__
return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2033
i also run ''albu/train.sh"
~/Segmentation/kaggle_carvana_segmentation/albu/src ~/Segmentation/kaggle_carvana_segmentation/albu
train.sh: 行 6: 6834 段错误 (核心已转储) PYTHONPATH=$(pwd):$PYTHONPATH python train.py
How do you run the scripts?
You should use bash. bash setup_env.sh
and bash train.sh
.
May be you use sh
instead of bash
.
Thank you very much
I found the reason for the segmentation fault
my pytorch is not work well with tensorboardX when first import pytorch, so i import tensorboardX first.
the asankoy's solution is run well
but albu's solution is report not enough memory. should i min the bath_size or min the image size ?
the ternaus's solution also the same error, it seem appear wong when call enumerate(tl)
Try to decrease the batch size.
Thanks for your help :)