open-mmlab/mmyolo

IndexError: index 2 is out of bounds for axis 2 with size 1

helloansuman opened this issue · 0 comments

Prerequisite

🐞 Describe the bug

I have started training using the below code.
!python tools/train.py configs/rtmdet/custom.py
but after few epochs I got error as below.
Please need your help.

Error log:
error.txt

Runtime environment:
cudnn_benchmark: True
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 1457807616
Distributed launcher: none
Distributed training: False
GPU number: 1
12/13 05:14:52 - mmengine - INFO - Checkpoints will be saved to /workspace/rtmdet/mmyolo/work_dirs/custom.
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
12/13 05:15:28 - mmengine - INFO - Epoch(train) [1][50/60] base_lr: 1.9623e-04 lr: 1.9623e-04 eta: 1:11:27 time: 0.7205 data_time: 0.0135 memory: 31517 loss: 1.8839 loss_cls: 0.5095 loss_bbox: 1.3744
12/13 05:15:45 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:16:05 - mmengine - INFO - Epoch(train) [2][50/60] base_lr: 4.3647e-04 lr: 4.3647e-04 eta: 1:05:23 time: 0.4043 data_time: 0.0140 memory: 23489 loss: 1.6147 loss_cls: 0.4670 loss_bbox: 1.1477
12/13 05:16:09 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:16:29 - mmengine - INFO - Epoch(train) [3][50/60] base_lr: 6.7671e-04 lr: 6.7671e-04 eta: 0:55:28 time: 0.4026 data_time: 0.0117 memory: 24541 loss: 1.5585 loss_cls: 0.4597 loss_bbox: 1.0987
12/13 05:16:33 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:16:53 - mmengine - INFO - Epoch(train) [4][50/60] base_lr: 9.1695e-04 lr: 9.1695e-04 eta: 0:50:27 time: 0.3982 data_time: 0.0124 memory: 23215 loss: 1.5515 loss_cls: 0.4901 loss_bbox: 1.0614
12/13 05:16:56 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:17:17 - mmengine - INFO - Epoch(train) [5][50/60] base_lr: 1.1572e-03 lr: 1.1572e-03 eta: 0:47:28 time: 0.4060 data_time: 0.0118 memory: 23923 loss: 1.5325 loss_cls: 0.5029 loss_bbox: 1.0296
12/13 05:17:20 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:17:41 - mmengine - INFO - Epoch(train) [6][50/60] base_lr: 1.3974e-03 lr: 1.3974e-03 eta: 0:45:23 time: 0.4076 data_time: 0.0129 memory: 23580 loss: 1.5189 loss_cls: 0.5250 loss_bbox: 0.9939
12/13 05:17:44 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:18:05 - mmengine - INFO - Epoch(train) [7][50/60] base_lr: 1.6377e-03 lr: 1.6377e-03 eta: 0:43:45 time: 0.4035 data_time: 0.0129 memory: 23112 loss: 1.5360 loss_cls: 0.5540 loss_bbox: 0.9820
12/13 05:18:08 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:18:29 - mmengine - INFO - Epoch(train) [8][50/60] base_lr: 1.8779e-03 lr: 1.8779e-03 eta: 0:42:26 time: 0.4034 data_time: 0.0123 memory: 25155 loss: 1.5134 loss_cls: 0.5581 loss_bbox: 0.9553
12/13 05:18:32 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:18:53 - mmengine - INFO - Epoch(train) [9][50/60] base_lr: 2.1181e-03 lr: 2.1181e-03 eta: 0:41:24 time: 0.4104 data_time: 0.0127 memory: 23272 loss: 1.5070 loss_cls: 0.5551 loss_bbox: 0.9519
12/13 05:18:57 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:19:17 - mmengine - INFO - Epoch(train) [10][50/60] base_lr: 2.3584e-03 lr: 2.3584e-03 eta: 0:40:26 time: 0.4041 data_time: 0.0111 memory: 24584 loss: 1.5012 loss_cls: 0.5634 loss_bbox: 0.9377
12/13 05:19:20 - mmengine - INFO - Exp name: custom_20231213_051445
12/13 05:19:20 - mmengine - INFO - Saving checkpoint at 10 epochs
12/13 05:19:35 - mmengine - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=0.04s)
Creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=3.42s).
Accumulating evaluation results...
DONE (t=0.09s).
Traceback (most recent call last):
File "tools/train.py", line 123, in
main()
File "tools/train.py", line 119, in main
runner.train()
File "/usr/local/lib/python3.8/dist-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/usr/local/lib/python3.8/dist-packages/mmengine/runner/loops.py", line 102, in run
self.runner.val_loop.run()
File "/usr/local/lib/python3.8/dist-packages/mmengine/runner/loops.py", line 366, in run
metrics = self.evaluator.evaluate(len(self.dataloader.dataset))
File "/usr/local/lib/python3.8/dist-packages/mmengine/evaluator/evaluator.py", line 79, in evaluate
_results = metric.evaluate(size)
File "/usr/local/lib/python3.8/dist-packages/mmengine/evaluator/metric.py", line 133, in evaluate
_metrics = self.compute_metrics(results) # type: ignore
File "/usr/local/lib/python3.8/dist-packages/mmdet/evaluation/metrics/coco_metric.py", line 512, in compute_metrics
coco_eval.summarize()
File "/usr/local/lib/python3.8/dist-packages/pycocotools/cocoeval.py", line 518, in summarize
self.stats = summarize()
File "/usr/local/lib/python3.8/dist-packages/pycocotools/cocoeval.py", line 485, in _summarizeDets
stats[0] = _summarize(1)
File "/usr/local/lib/python3.8/dist-packages/pycocotools/cocoeval.py", line 469, in _summarize
s = s[:,aind,mind,:,:]
IndexError: index 2 is out of bounds for axis 2 with size 1

Custom config file is attached to check if any error.
custom.txt

Environment

12/13 05:14:45 - mmengine - WARNING - Failed to search registry with scope "mmyolo" in the "log_processor" registry tree. As a workaround, the current "log_processor" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmyolo" is a correct scope, or whether the registry is initialized.
12/13 05:14:46 - mmengine - INFO -
System environment:
sys.platform: linux
Python: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
CUDA available: True
numpy_random_seed: 1457807616
GPU 0: NVIDIA A100-SXM4-40GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.0, V12.0.140
GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 2.0.0+cu117
PyTorch compiling details: PyTorch built with:

GCC 9.3

C++ Version: 201703

Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)

OpenMP 201511 (a.k.a. OpenMP 4.5)

LAPACK is enabled (usually provided by MKL)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.7

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86

CuDNN 8.5

Magma 2.6.1

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.1+cu117
OpenCV: 4.6.0
MMEngine: 0.10.1

Additional information

No response