Turoad/CLRNet

train with tuSimple error

Closed this issue · 3 comments

hello ,thanks for you work.i got error when train with tusimple dataset.i hava generate dataset label by

python tools/generate_seg_tusimple.py --root $TUSIMPLEROOT

but i got this error.can you help me? thanks

2022-10-12 09:21:29,378 - clrnet.utils.recorder - INFO - epoch: 2  step: 342  lr: 0.000995  loss: 2.4460  cls_loss: 0.4769  reg_xytl_loss: 1.0033  seg_loss: 0.1883  iou_loss: 0.7775  stage_0_acc: 99.0353  stage_1_acc: 99.0403  stage_2_acc: 99.0318  data: 0.0050  batch: 0.7463  eta: 1:53:03
2022-10-12 09:21:29,417 - clrnet.datasets.base_dataset - INFO - Loading TuSimple annotations...

Validate:   0%|          | 0/87 [00:00<?, ?it/s][ WARN:0@308.142] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626760788443246_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.143] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626611879628614_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626875719975670_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626773780024386_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627128564091098_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.151] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492626441983295158_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.152] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627024628422609_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.152] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628137086603577_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627830272283289_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628291992700973_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627834271872693_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628725131496677_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628986972995265_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628479284320929_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492630633143870969_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.153] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492627828273402796_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492630371302032554_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492630632144442295_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492629750677018212_0/20.jpg'): can't open/read file: check file path/integrity
[ WARN:0@308.154] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492628172065629178_0/20.jpg'): can't open/read file: check file path/integrity

Validate:   0%|          | 0/87 [00:00<?, ?it/s]
[ WARN:0@308.155] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./data/tusimple/clips/0530/1492629872603376884_0/20.jpg'): can't open/read file: check file path/integrity
Traceback (most recent call last):
  File "main.py", line 75, in <module>
    main()
  File "main.py", line 39, in main
    runner.train()
  File "/home/lab509/xbc/dp_hough/CLRNet/clrnet/engine/runner.py", line 98, in train
    self.validate()
  File "/home/lab509/xbc/dp_hough/CLRNet/clrnet/engine/runner.py", line 133, in validate
    for i, data in enumerate(tqdm(self.val_loader, desc=f'Validate')):
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/tqdm-4.64.1-py3.8.egg/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/lab509/anaconda3/envs/clrnet/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/lab509/xbc/dp_hough/CLRNet/clrnet/datasets/base_dataset.py", line 40, in __getitem__
    img = img[self.cfg.cut_height:, :, :]
TypeError: 'NoneType' object is not subscriptable

fixed it,thanks

你是如何修好它的?训练没问题,但是验证时出错,2023-12-12 14:51:47,799 - clrnet.datasets.base_dataset - INFO - Loading TuSimple annotations...
Validate: 0%| | 0/70 [00:00<?, ?it/s]/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Validate: 0%| | 0/70 [00:06<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 75, in
main()
File "main.py", line 35, in main
runner.validate()
File "/home/cicero/hbx/clrnet/clrnet/engine/runner.py", line 136, in validate
output = self.net(data)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/mmcv-1.2.5-py3.7.egg/mmcv/parallel/data_parallel.py", line 42, in forward
return super().forward(*inputs, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/hbx/clrnet/clrnet/models/nets/detector.py", line 34, in forward
output = self.heads(fea)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/hbx/clrnet/clrnet/models/heads/clr_head.py", line 215, in forward
batch_features[stage], stage)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/hbx/clrnet/clrnet/models/utils/roi_gather.py", line 114, in forward
roi = self.roi_fea(roi_features, layer_index)
File "/home/cicero/hbx/clrnet/clrnet/models/utils/roi_gather.py", line 102, in roi_fea
cat_feat = self.catconvlayer_index
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/mmcv-1.2.5-py3.7.egg/mmcv/cnn/bricks/conv_module.py", line 193, in forward
x = self.conv(x)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/cicero/miniconda3/envs/hbxtorch/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

fixed it,thanks

how do you fixed it?bro