The fps of yolov6_v3_n is very low

Question

The fps of yolov6_v3_n is very low

wsy-yjys opened this issue 3 months ago · 5 comments

Prerequisite

I have searched the existing and past issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

🐞 Describe the bug

Hello, I first train the yolov6_v3_n on COCO according to the configs of 300 epoch provided by you. The result is shown below, which is normal. Here's the model

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.522
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.399
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.172
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.409
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.532
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.315
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.578
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.336
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.649
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.774
03/17 20:36:18 - mmengine - INFO - bbox_mAP_copypaste: 0.368 0.522 0.399 0.172 0.409 0.532
03/17 20:36:22 - mmengine - INFO - Epoch(val) [300][2500/2500]    coco/bbox_mAP: 0.3680  coco/bbox_mAP_50: 0.5220  coco/bbox_mAP_75: 0.3990  coco/bbox_mAP_s: 0.1720  coco/bbox_mAP_m: 0.4090  coco/bbox_mAP_l: 0.5320  data_time: 0.0005  time: 0.0583

But when I tested the speed with the following command, it was very slow.

python tools/analysis_tools/benchmark.py configs/yolov6/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD.py  work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth --fuse-conv-bn --max-iter 20 --repeat-num 3

loading annotations into memory...
Done (t=0.66s)
creating index...
index created!
Loads checkpoint by local backend from path: work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth
switch to deploy done!
/home/wsy/anaconda3/envs/py39torch2/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
03/19 09:44:05 - mmengine - INFO - Overall fps: 11.6 img / s, times per image: 86.2 ms / img
loading annotations into memory...
Done (t=0.48s)
creating index...
index created!
Loads checkpoint by local backend from path: work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth
switch to deploy done!
03/19 09:44:08 - mmengine - INFO - Overall fps: 23.2 img / s, times per image: 43.1 ms / img
loading annotations into memory...
Done (t=0.65s)
creating index...
index created!
Loads checkpoint by local backend from path: work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth
switch to deploy done!
03/19 09:44:11 - mmengine - INFO - Overall fps: 19.9 img / s, times per image: 50.3 ms / img
03/19 09:44:11 - mmengine - INFO - Overall fps: [11.6, 23.2, 19.9][18.2] img / s, times per image: [86.2, 43.1, 50.3][59.9] ms / img

Environment

sys.platform: linux
Python: 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.2, V11.2.67
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 2.0.0+cu117
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.15.1+cu117
OpenCV: 4.9.0
MMEngine: 0.9.1
MMCV: 2.0.1
MMDetection: 3.2.0
MMYOLO: 0.6.0+8c4d9dc

Additional information

No response

Answer 1 · 2024-03-22T07:42:21.000Z

--task inference

Answer 2 · 2024-03-22T07:51:06.000Z

(py39torch2) wsy@ubuntu:~/paper2/mmyolo$ python tools/analysis_tools/benchmark_old.py configs/yolov6/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD.py  work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth --fuse-conv-bn --max-iter 20 --repeat-num 3 --task inference
usage: benchmark_old.py [-h] [--repeat-num REPEAT_NUM] [--max-iter MAX_ITER] [--log-interval LOG_INTERVAL] [--work-dir WORK_DIR] [--fuse-conv-bn]
                        [--cfg-options CFG_OPTIONS [CFG_OPTIONS ...]] [--launcher {none,pytorch,slurm,mpi}] [--local_rank LOCAL_RANK]
                        config checkpoint
benchmark_old.py: error: unrecognized arguments: --task inference

Answer 3 · 2024-03-22T07:54:17.000Z

@slantingsun Hello, mmyolo does not have task arguments. In addition, I tested other models, including yolox and yolov7, and their fps is normal, only yolov6_v3 is abnormal.

Answer 4 · 2024-03-26T13:06:52.000Z

i use the code

mim run mmdet benchmark config_myself/yolov5_s-v61_syncbn_8xb16-300e_coco.py --checkpoint work_dirs/yolov5_s-v61_syncbn_8xb16-300e_coco/best_coco_bbox_mAP_epoch_300.pth --task inference

Answer 5 · 2024-03-26T14:57:00.000Z

@slantingsun, your test is yolov5, I use the following command, yolov6_v3_n_syncbn_fast results are still abnormal, have you tested yolov6？

(py39torch2) wsy@ubuntu:~/paper2/mmyolo$ mim run mmdet benchmark configs/yolov6/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD.py  --checkpoint work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth --fuse-conv-bn --max-iter 20 --repeat-num 3 --task inference
Use the script /home/wsy/anaconda3/envs/py39torch2/lib/python3.9/site-packages/mmdet/.mim/tools/analysis_tools/benchmark.py for command benchmark.
The command to call is /home/wsy/anaconda3/envs/py39torch2/bin/python /home/wsy/anaconda3/envs/py39torch2/lib/python3.9/site-packages/mmdet/.mim/tools/analysis_tools/benchmark.py configs/yolov6/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD.py --checkpoint work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth --fuse-conv-bn --max-iter 20 --repeat-num 3 --task inference.
03/26 22:53:42 - mmengine - INFO - before build:
03/26 22:53:42 - mmengine - INFO - (GB) mem_used: 7.34 | uss: 0.40 | pss: 0.41 | total_proc: 1
Loads checkpoint by local backend from path: work_dirs/yolov6_v3_n_syncbn_fast_2xb32-300e_coco_SGD/best_coco_bbox_mAP_epoch_300.pth
loading annotations into memory...
Done (t=0.63s)
creating index...
index created!
03/26 22:53:46 - mmengine - INFO - after build:
03/26 22:53:46 - mmengine - INFO - (GB) mem_used: 8.77 | uss: 2.10 | pss: 2.11 | total_proc: 1
/home/wsy/anaconda3/envs/py39torch2/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
03/26 22:53:52 - mmengine - INFO - ============== Done ==================
03/26 22:53:52 - mmengine - INFO - Overall fps: [9.2, 19.7, 21.0][16.6] img/s, times per image: [109.2, 50.9, 47.6][69.2] ms/img
03/26 22:53:52 - mmengine - INFO - cuda memory: 574 MB
03/26 22:53:52 - mmengine - INFO - (GB) mem_used: 10.17 | uss: 3.83 | pss: 3.84 | total_proc: 1
The script finished successfully.