Why can't I achieve the experimental effect in the paper?
ssunguotu opened this issue · 8 comments
Thanks for your work!
In the Day Clear scene, I achieved an mAP of only 46.4%
, which differs from the reported mAP of 51.3%
in the research paper. I used two NVIDIA RTX 3090 GPUs for training and made some modifications to the train.py code to enable multiple GPU training.
if __name__ == "__main__":
args = default_argument_parser().parse_args()
print("Command Line Args:", args)
launch(
main,
args.num_gpus,
num_machines=args.num_machines,
machine_rank=args.machine_rank,
dist_url=args.dist_url,
args=(args,),
)
Additionally, in the run_step
function, I made the following changes:
opt_phase = False
if len(self.off_opt_interval) and self.iter >= self.off_opt_interval[0] and self.iter < self.off_opt_interval[0]+self.off_opt_iters:
if self.iter == self.off_opt_interval[0]:
self.model.module.offsets.data = torch.zeros(self.model.module.offsets.shape).cuda()
loss_dict_s = self.model.module.opt_offsets(data_s)
opt_phase = True
if self.iter+1 == self.off_opt_interval[0]+self.off_opt_iters:
self.off_opt_interval.pop(0)
Waiting for your early reply, thank you!
ohh, I found I got the same results when I used the model you posted. Am I testing the model in the wrong way?
I modify the code like that:
in configs/diverse_weather.yaml
MODEL:
BACKBONE:
NAME: ClipRN101
WEIGHTS: "/code/domaingen/diverse-weights.pth"
and run
python train.py --eval-only --config-file configs/diverse_weather.yaml
Hello @ssunguotu, thank you for your interest.
The evaluation command you mentioned is correct. Could you please verify that the checkpoint returns the reported mAP@50 without your code modifications?
Thank you for your reply!
I have verified that the checkpoint returns the same mAP@50 without my code modifications.
Here is my log when I test the model. I found it strange that my dataset seemed to have only 8289 images when I testing, but the datasets actually have 8313 images. I have checked the /daytime_clear/ImageSets/Main/test.txt
, /daytime_clear/Annotations
, and /daytime_clear/JPEGImages
, the numbers are all right.
I don't know whether it's the point to the result difference.
[08/29 10:40:34 detectron2]: Full config saved to all_outs/diverse_weather/origin_v2/config.yaml
[08/29 10:40:34 d2.utils.env]: Using a generated random seed 34803406
['bus', 'bike', 'car', 'motor', 'person', 'rider', 'truck']
[08/29 10:41:03 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /code/domaingen/diverse-weights.pth ...
[08/29 10:41:03 fvcore.common.checkpoint]: [Checkpointer] Loading from /code/domaingen/diverse-weights.pth ...
[08/29 10:41:09 d2.data.build]: Distribution of instances among all 7 categories:
| category | #instances | category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
| bus | 1738 | bike | 1046 | car | 95339 |
| motor | 537 | person | 12309 | rider | 787 |
| truck | 5029 | | | | |
| total | 116785 | | | | |
[08/29 10:41:09 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(600, 600), max_size=1333, sample_style='choice')]
[08/29 10:41:09 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[08/29 10:41:09 d2.data.common]: Serializing 8289 elements to byte tensors and concatenating them all ...
[08/29 10:41:09 d2.data.common]: Serialized dataset takes 7.92 MiB
[08/29 10:41:09 d2.evaluation.evaluator]: Start inference on 8289 batches
/miniconda3/envs/frcnn/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[08/29 10:41:16 d2.evaluation.evaluator]: Inference done 11/8289. Dataloading: 0.0016 s/iter. Inference: 0.2062 s/iter. Eval: 0.0007 s/iter. Total: 0.2085 s/iter. ETA=0:28:45
[08/29 10:41:21 d2.evaluation.evaluator]: Inference done 36/8289. Dataloading: 0.0025 s/iter. Inference: 0.2000 s/iter. Eval: 0.0007 s/iter. Total: 0.2033 s/iter. ETA=0:27:57
[08/29 10:41:26 d2.evaluation.evaluator]: Inference done 62/8289. Dataloading: 0.0022 s/iter. Inference: 0.1971 s/iter. Eval: 0.0007 s/iter. Total: 0.2001 s/iter. ETA=0:27:25
..........
[08/29 11:08:47 d2.evaluation.evaluator]: Inference done 8284/8289. Dataloading: 0.0014 s/iter. Inference: 0.1975 s/iter. Eval: 0.0006 s/iter. Total: 0.1995 s/iter. ETA=0:00:00
[08/29 11:08:48 d2.evaluation.evaluator]: Total inference time: 0:27:32.976390 (0.199538 s / iter per device, on 1 devices)
[08/29 11:08:48 d2.evaluation.evaluator]: Total inference pure compute time: 0:27:16 (0.197507 s / iter per device, on 1 devices)
[08/29 11:08:48 d2.evaluation.pascal_voc_evaluation]: Evaluating daytime_clear_test using 2007 metric. Note that results do not use the official Matlab API.
[08/29 11:11:11 d2.evaluation.pascal_voc_evaluation]: classwise ap 53.27,42.05,57.16,39.37,39.86,40.24,52.46
[08/29 11:11:11 detectron2]: Evaluation results for daytime_clear_test in csv format:
[08/29 11:11:11 d2.evaluation.testing]: copypaste: Task: bbox
[08/29 11:11:11 d2.evaluation.testing]: copypaste: AP,AP50,AP75
[08/29 11:11:11 d2.evaluation.testing]: copypaste: 22.8409,46.3442,18.9491
I re-ran the evaluation code with the provided checkpoint and was able to get the 51.3 mAP.
[08/30 13:32:03 detectron2]: Full config saved to all_outs/diverse_weather/config.yaml
[08/30 13:32:03 d2.utils.env]: Using a generated random seed 3127000
['bus', 'bike', 'car', 'motor', 'person', 'rider', 'truck']
[08/30 13:32:18 fvcore.common.checkpoint]: [Checkpointer] Loading from diverse-weights.pth ...
[08/30 13:32:22 d2.data.build]: Distribution of instances among all 7 categories:
| category | #instances | category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
| bus | 1738 | bike | 1046 | car | 95339 |
| motor | 537 | person | 12309 | rider | 787 |
| truck | 5029 | | | | |
| total | 116785 | | | | |
[08/30 13:32:22 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(600, 600), max_size=1333, sample_style='choice')]
[08/30 13:32:22 d2.data.common]: Serializing 8289 elements to byte tensors and concatenating them all ...
[08/30 13:32:22 d2.data.common]: Serialized dataset takes 8.26 MiB
[08/30 13:32:22 d2.evaluation.evaluator]: Start inference on 8289 batches
[08/30 13:32:24 d2.evaluation.evaluator]: Inference done 11/8289. Dataloading: 0.0012 s/iter. Inference: 0.0842 s/iter. Eval: 0.0004 s/iter. Total: 0.0858 s/iter. ETA=0:11:50
[08/30 13:45:06 d2.evaluation.evaluator]: Inference done 8253/8289. Dataloading: 0.0011 s/iter. Inference: 0.0907 s/iter. Eval: 0.0006 s/iter. Total: 0.0924 s/iter. ETA=0:00:03
[08/30 13:45:10 d2.evaluation.evaluator]: Total inference time: 0:12:46.125840 (0.092483 s / iter per device, on 1 devices)
[08/30 13:45:10 d2.evaluation.evaluator]: Total inference pure compute time: 0:12:31 (0.090753 s / iter per device, on 1 devices)
[08/30 13:45:10 d2.evaluation.pascal_voc_evaluation]: Evaluating daytime_clear_test using 2007 metric. Note that results do not use the official Matlab API.
[08/30 13:47:31 d2.evaluation.pascal_voc_evaluation]: classwise ap 54.90,46.23,66.08,45.19,47.45,44.63,54.33
[08/30 13:47:31 detectron2]: Evaluation results for daytime_clear_test in csv format:
[08/30 13:47:31 d2.evaluation.testing]: copypaste: Task: bbox
[08/30 13:47:31 d2.evaluation.testing]: copypaste: AP,AP50,AP75
[08/30 13:47:31 d2.evaluation.testing]: copypaste: 27.3545,51.2566,24.1571
Could you please verify that the requirements are properly set up? detectron2 version 0.6 and torch 1.10 , both with cuda 11.3
The mAP result changed after I rebuilt the environment, but it still different to 51.3
, even more lower.
I am really confused... Why can the results be changed?
Here is my environment:
sys.platform linux
Python 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0]
numpy 1.21.5
detectron2 0.6 @/code/detectron2/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.3
detectron2 arch flags 8.6
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.10.0 @/miniconda3/envs/detectron/lib/python3.9/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0,1 NVIDIA GeForce RTX 3090 (arch=8.6)
Driver version 470.103.01
CUDA_HOME /usr/local/cuda
Pillow 8.2.0
torchvision 0.11.0 @/miniconda3/envs/detectron/lib/python3.9/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.5.2
OK... It seems that I found the reason. The daytime_clear datasets in my server is different to the official version. I replaced the old datasets with the latest version, and the result worked fine. The mAP is 52.3
, even better than the reported results.
[08/31 18:44:54 d2.evaluation.pascal_voc_evaluation]: Evaluating daytime_clear_test using 2007 metric. Note that results do not use the official Matlab API.
[08/31 18:46:57 d2.evaluation.pascal_voc_evaluation]: classwise ap 54.71,47.70,67.45,45.87,48.96,46.73,54.82
[08/31 18:46:57 detectron2]: Evaluation results for daytime_clear_test in csv format:
[08/31 18:46:57 d2.evaluation.testing]: copypaste: Task: bbox
[08/31 18:46:57 d2.evaluation.testing]: copypaste: AP,AP50,AP75
[08/31 18:46:57 d2.evaluation.testing]: copypaste: 29.3687,52.3192,27.2935
thanks, closing this issue now.
OK... It seems that I found the reason. The daytime_clear datasets in my server is different to the official version. I replaced the old datasets with the latest version, and the result worked fine. The mAP is
52.3
, even better than the reported results.[08/31 18:44:54 d2.evaluation.pascal_voc_evaluation]: Evaluating daytime_clear_test using 2007 metric. Note that results do not use the official Matlab API. [08/31 18:46:57 d2.evaluation.pascal_voc_evaluation]: classwise ap 54.71,47.70,67.45,45.87,48.96,46.73,54.82 [08/31 18:46:57 detectron2]: Evaluation results for daytime_clear_test in csv format: [08/31 18:46:57 d2.evaluation.testing]: copypaste: Task: bbox [08/31 18:46:57 d2.evaluation.testing]: copypaste: AP,AP50,AP75 [08/31 18:46:57 d2.evaluation.testing]: copypaste: 29.3687,52.3192,27.2935
I have 8289 images when I testing too, You said "The daytime_clear datasets in my server is different to the official version",what is the offcial version?