train with tuSimple error
Closed this issue · 3 comments
hello ,thanks for you work.i got error when train with tusimple dataset.i hava generate dataset label by
python tools/generate_seg_tusimple.py --root $TUSIMPLEROOT
but i got this error.can you help me? thanks
2022-10-12 09:21:29,378 - clrnet.utils.recorder - INFO - epoch: 2 step: 342 lr: 0.000995 loss: 2.4460 cls_loss: 0.4769 reg_xytl_loss: 1.0033 seg_loss: 0.1883 iou_loss: 0.7775 stage_0_acc: 99.0353 stage_1_acc: 99.0403 stage_2_acc: 99.0318 data: 0.0050 batch: 0.7463 eta: 1:53:03
2022-10-12 09:21:29,417 - clrnet.datasets.base_dataset - INFO - Loading TuSimple annotations...
Validate: 0%| | 0/87 [00:00<?, ?it/s][ WARN:0@308.142] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626760788443246_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.143] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626611879628614_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626875719975670_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626773780024386_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627128564091098_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626441983295158_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.152] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627024628422609_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.152] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628137086603577_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627830272283289_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628291992700973_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627834271872693_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628725131496677_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628986972995265_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628479284320929_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492630633143870969_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627828273402796_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492630371302032554_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492630632144442295_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492629750677018212_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628172065629178_0/20.jpg'): can't open/read file: check file path/integrity
Validate: 0%| | 0/87 [00:00<?, ?it/s]
[ WARN:0@308.155] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492629872603376884_0/20.jpg'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "main.py", line 75, in <module>
main()
File "main.py", line 39, in main
runner.train()
File "/home/lab509/xbc/dp_hough/CLRNet/clrnet/engine/runner.py", line 98, in train
self.validate()
File "/home/lab509/xbc/dp_hough/CLRNet/clrnet/engine/runner.py", line 133, in validate
for i, data in enumerate(tqdm(self.val_loader, desc=f'Validate')):
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/tqdm-4.64.1-py3.8.egg/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lab509/xbc/dp_hough/CLRNet/clrnet/datasets/base_dataset.py", line 40, in __getitem__
img = img[self.cfg.cut_height:, :, :]
TypeError: 'NoneType' object is not subscriptable
fixed it,thanks
你是如何修好它的?训练没问题,但是验证时出错,2023-12-12 14:51:47,799 - clrnet.datasets.base_dataset - INFO - Loading TuSimple annotations...
Validate: 0%| | 0/70 [00:00<?, ?it/s]/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Validate: 0%| | 0/70 [00:06<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 75, in
main()
File "main.py", line 35, in main
runner.validate()
File "/home/cicero/hbx/clrnet/clrnet/engine/runner.py", line 136, in validate
output = self.net(data)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/mmcv-1.2.5-py3.7.egg/mmcv/parallel/data_parallel.py", line 42, in forward
return super().forward(*inputs, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/hbx/clrnet/clrnet/models/nets/detector.py", line 34, in forward
output = self.heads(fea)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/hbx/clrnet/clrnet/models/heads/clr_head.py", line 215, in forward
batch_features[stage], stage)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/hbx/clrnet/clrnet/models/utils/roi_gather.py", line 114, in forward
roi = self.roi_fea(roi_features, layer_index)
File "/home/cicero/hbx/clrnet/clrnet/models/utils/roi_gather.py", line 102, in roi_fea
cat_feat = self.catconvlayer_index
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/mmcv-1.2.5-py3.7.egg/mmcv/cnn/bricks/conv_module.py", line 193, in forward
x = self.conv(x)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
fixed it,thanks
how do you fixed it?bro