MM Grounding Dino inference result of DetInferencer() is much worse than tools/test.py
shojint opened this issue · 0 comments
Describe the issue
I have trained a checkpoint of mm_grounding_dino_swin-b. When I do inference using the testing dataset using tools/test.py, the accuracy is satisfactory. However, doing inference using the same image with DetInferencer()
resulted in a much worse result.
Reproduction
In an environment with mmdetection installed, run the following code.
I'm sorry I can't share my checkpoint. You would need to use your own checkpoint.
from mmdet.apis import DetInferencer
inferencer = DetInferencer(model='/home/vgpu/mmdetection/configs/mm_grounding_dino/grounding_dino_swin-b_finetune_taco_map.py',
weights='/home/vgpu/mmdetection/taco_work_dir/best_coco_bbox_mAP_epoch_14_20241024_151401.pth')
local_image_path = "/home/vgpu/TACO/data/batch_1/000045.jpg"
TEXT_PROMPT = 'can'
inferencer(local_image_path, show=True, texts=TEXT_PROMPT, custom_entities=True)
Config file
_base_ = 'grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py'
data_root = '/home/vgpu/TACO/data/'
class_name = ('Aluminium foil', 'Battery', 'Aluminium blister pack', 'Glass bottle',
'Plastic bottle cap', 'Metal bottle cap', 'Broken glass', 'Food waste',
'Plastic lid', 'Metal lid', 'Other plastic', 'Plastic film', 'Plastic utensils',
'Pop tab', 'Rope & strings', 'Squeezable tube', 'Styrofoam piece',
'Unlabeled litter', 'Cigarette', 'Can', 'Plastic bottle', 'Carton', 'Cup', 'Paper',
'Wrapper', 'Plastic container', 'Straw')
num_classes = len(class_name)
metainfo = dict(
classes=class_name,
# palette=[(235, 211, 70),
# (106, 90, 205),
# (160, 32, 240),
# (176, 23, 31),
# (142, 0, 0),
# (230, 0, 0),
# (106, 0, 228),
# (60, 100, 0)]
)
model = dict(bbox_head=dict(num_classes=num_classes))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RandomFlip', prob=0.5),
dict(
type='RandomChoice',
transforms=[
[
dict(
type='RandomChoiceResize',
scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
keep_ratio=True)
],
[
dict(
type='RandomChoiceResize',
# The radio of all image in train dataset < 7
# follow the original implement
scales=[(400, 4200), (500, 4200), (600, 4200)],
keep_ratio=True),
dict(
type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(
type='RandomChoiceResize',
scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
keep_ratio=True)
]
]),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor', 'flip', 'flip_direction', 'text',
'custom_entities'))
]
train_dataloader = dict(
dataset=dict(
_delete_=True,
type='CocoDataset',
data_root=data_root,
metainfo=metainfo,
return_classes=True,
pipeline=train_pipeline,
filter_cfg=dict(filter_empty_gt=False, min_size=32),
ann_file='mapped_annotations/annotations_0_train.json',
data_prefix=dict(img='.')))
val_dataloader = dict(
dataset=dict(
metainfo=metainfo,
data_root=data_root,
ann_file='mapped_annotations/annotations_0_val.json',
data_prefix=dict(img='.')))
test_dataloader = val_dataloader
val_evaluator = dict(
ann_file=data_root + 'mapped_annotations/annotations_0_val.json', classwise=True)
test_evaluator = val_evaluator
max_epochs = 100
default_hooks = dict(
checkpoint=dict(interval=1, max_keep_ckpts=1, save_best='coco/bbox_mAP'),
logger=dict(type='LoggerHook', interval=5))
train_cfg = dict(max_epochs=max_epochs, val_interval=1)
param_scheduler = [
dict(
type='MultiStepLR',
begin=0,
end=max_epochs,
by_epoch=True,
milestones=[15],
gamma=0.1)
]
optim_wrapper = dict(
optimizer=dict(lr=0.0001),
paramwise_cfg=dict(
custom_keys={
'absolute_pos_embed': dict(decay_mult=0.),
'backbone': dict(lr_mult=0.1), #Should I change this?
'language_model': dict(lr_mult=0.0)
}))
load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_all/grounding_dino_swin-b_pretrain_all-f9818a7c.pth' # noqa
Environment
sys.platform: linux
Python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1: NVIDIA GeForce RTX 3090 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.99
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.4.1
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1 (built against CUDA 12.4)
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.19.1
OpenCV: 4.10.0
MMEngine: 0.10.5
MMDetection: 3.3.0+cfd5d3a
Output
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead.
from torch.distributed.optim import \
Loads checkpoint by local backend from path: /home/vgpu/mmdetection/taco_work_dir/best_coco_bbox_mAP_epoch_14_20241024_151401.pth
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/mmengine/runner/checkpoint.py:347: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(filename, map_location=map_location)
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
12/23 14:36:18 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the `save_dir` argument.
warnings.warn(f'Failed to add {vis_backend.__class__}, '
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/torch/functional.py:513: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at
/opt/conda/conda-bld/pytorch_1724789172399/work/aten/src/ATen/native/TensorShape.cpp:3609.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/vgpu/mmcv/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.
warnings.warn(f'position encoding of key is'
Inference ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━