hustvl/WeakTr

coco iou

ProjectDisR opened this issue · 11 comments

by using the provided checkpoint on coco to generate cam on train_1250_txt, the miou is only ~7%. The iou is nearly 0% start from the class "street sign".

wondering why is that?

already used the script to prepare data

and used the command
python evaluation.py --list coco/train_1250_id.txt --data-path data --type npy --predict_dir WeakTr_results_coco/WeakTr/attn-patchrefine-npy-ms --out-dir WeakTr_results_coco/WeakTr/pseudo-mask-ms-crf --num_classes 91 --start 40 --t 42 &

should num_classes be set to 91 ?

is it because this line " c = self.CAT_LIST.index(cat)" which should be "c=cat" ?

and the performance seems not reaching the reported miou

Thank you for your interest in our research. Regarding the COCO-CAM results in our paper, you can reproduce them using the following code through pseudo masks:

python evaluation.py --list coco/train_id.txt \
                     --data-path data \
                     --type png \
                     --predict_dir data/coco/voc_format/WeakTr_CAMlb_wCRF_COCO \
                     --num_classes 91

As for your difficulty in reproducing our results with the provided COCO-CAM checkpoint, the reason is that our latest version of WeakTr-CAM includes optimizations to the model architecture. We have already updated the VOC checkpoint corresponding to the newest architecture. And we anticipate updating the COCO CAM checkpoint corresponding to the latest code within this week.
Once again, we appreciate your interest and issues.

Greetings! We have uploaded the newest checkpoints to the cloud. You can download the COCO-CAM checkpoints and try the following commands to get a mIoU result of 42.6% at the train_id split :

# Generate CAM
python main.py --model deit_small_WeakTr_patch16_224 \
                --data-path data \
                --data-set COCOMS \
                --img-ms-list coco/train_id.txt \
                --scales 1.0 0.8 1.2 \
                --gen_attention_maps \
                --cam-npy-dir WeakTr_results_coco/WeakTr/attn-patchrefine-npy-ms \
                --output_dir WeakTr_results_coco/WeakTr \
                --reduction 8 \
                --pool-type max \
                --resume WeakTr_results_coco/WeakTr/checkpoint_best_mIoU.pth

# CRF post-processing
python evaluation.py --list coco/train_id.txt \
                     --data-path data \
                     --type npy \
                     --predict_dir WeakTr_results_coco/WeakTr/attn-patchrefine-npy-ms \
                     --out-dir WeakTr_results_coco/WeakTr/pseudo-mask-ms-crf \
                     --t 42 \
                     --num_classes 91 \
                     --out-crf

During the training of Coco using the following parameters, the average IOU verified on the ‘train_1250_txt’ was around 8%. Similar to the issue mentioned above, the IOU after the 'street sign' category is basically 0%. Can you tell me any structural changes in the model compared to before?

Training

python main.py --model deit_small_WeakTr_patch16_224
--data-path data
--data-set COCO
--img-ms-list coco/train_1250_id.txt
--gt-dir voc_format/class_labels
--cam-npy-dir WeakTr_results_coco/WeakTr/attn-patchrefine-npy
--output_dir WeakTr_results_coco/WeakTr
--reduction 8
--pool-type max
--lr 2e-4
--weight-decay 0.03 \

@SecretplayeRava Thanks for your interest!

  1. In fact, we achieved a 23% mIoU on the train_1250_id.txt after training for 4 epochs, as shown below. We also provide the complete history training curve.
  2. The IoU for the street sign category being 0% and marked as NaN% is reasonable because there are no street sign labels in the ground truth (GT) of the COCO 2014 dataset.
  3. It is suggested to update the WeakTr code to the latest version and try again.

Feel free to communicate with us if there are more problems.

img_v2_fa6e6138-07f1-4b99-a7eb-c5e73737c10g

img_v2_330ac268-9fec-493a-a5bf-1662d9a58d3g

Thank you for your reply!! I will try again. Another issue is that the WeakTr_CAMlb_wCRF_COCO verification results downloaded from Step1 COCO14 CAM_Label are as follows:

 background: 78.471%         person: 52.280%
    bicycle: 47.058%            car: 34.611%
 motorcycle: 62.461%       airplane: 60.815%
        bus: 60.625%          train: 54.598%
      truck: 44.861%           boat: 31.966%
traffic light: 30.339%  fire hydrant: 70.073%
street sign:  0.000%      stop sign:  0.000%
parking meter:  0.016%        bench:  0.042%
       bird:  0.008%            cat:  0.139%
        dog:  0.046%          horse:  0.007%
      sheep:  0.175%            cow:  0.000%
   elephant:  0.000%           bear:  0.002%
      zebra:  0.027%        giraffe:  0.000%
        hat:  0.000%       backpack:  1.238%
   umbrella:  0.000%           shoe:  0.000%
eye glasses:  0.000%        handbag:  0.000%
        tie:  0.000%       suitcase:  0.000%
    frisbee:  0.000%           skis:  0.000%
  snowboard:  0.000%    sports ball:  0.001%
       kite:  0.005%    baseball bat:  0.000%
baseball glove:  0.001%  skateboard:  0.000%
  surfboard:  0.000%    tennis racket:  0.000%
     bottle:  0.075%          plate:  0.000%
 wine glass:  0.024%            cup:  0.041%
       fork:  0.005%          knife:  0.059%
      spoon:  0.006%           bowl:  0.192%
     banana:  0.009%          apple:  0.000%
   sandwich:  0.025%         orange:  0.000%
   broccoli:  0.011%         carrot:  0.000%
    hot dog:  0.000%          pizza:  0.001%
      donut:  0.002%           cake:  1.067%
      chair:  0.003%          couch:  0.117%
potted plant:  0.004%           bed:  0.001%
     mirror:  0.000%    dining table:  0.001%
     window:  0.000%           desk:  0.000%
     toilet:  0.000%           door:  0.000%
         tv:  0.002%         laptop:  0.000%
      mouse:  0.507%         remote:  0.010%
   keyboard:  0.006%     cell phone:  0.021%
  microwave:  0.002%           oven:  0.000%
    toaster:  0.000%           sink:  0.000%
refrigerator:  0.000%       blender:    nan%
       book:  0.000%          clock:  0.000%
       vase:  0.000%       scissors:  0.000%
 teddy bear:  0.000%     hair drier:  0.000%
 toothbrush:  0.000%    
       mIoU:  7.023%

FP = 50.92535027974361, FN = 42.05178966478575
Prediction = 9.996101270785202, Recall = 10.579217918915957

The evaluation code used is

python evaluation.py --list coco/train_id.txt \
                     --data-path data \
                     --type png \
                     --predict_dir data/coco/voc_format/WeakTr_CAMlb_wCRF_COCO \
                     --num_classes 91

Thanks for your question!

  1. We try to evaluate it again using the CAM Label and Ground Truth provided in WeakTr_CAMlb_wCRF_COCO and class_labels, respectively.

Note: We apologize for the issue with the way to get class_labels for COCO 2014. In fact, the Ground Truth we used for the COCO 2014 dataset is sourced from PMM, and you can download it from the link we provided above.

  1. The command we used for evaluation is shown below:
python evaluation.py --list coco/train_id.txt \
                     --data-path data \
                     --type png \
                     --predict_dir data/coco/voc_format/WeakTr_CAMlb_wCRF_COCO \
                     --num_classes 91
  1. The results we obtained is shown below:
 background: 78.471%         person: 52.291%
    bicycle: 46.570%            car: 34.370%
 motorcycle: 61.962%       airplane: 60.793%
        bus: 59.976%          train: 54.657%
      truck: 44.976%           boat: 32.052%
traffic light: 30.331%  fire hydrant: 70.038%
street sign:    nan%      stop sign: 75.582%
parking meter: 62.229%        bench: 38.021%
       bird: 56.910%            cat: 72.963%
        dog: 72.499%          horse: 64.872%
      sheep: 64.672%            cow: 68.417%
   elephant: 75.607%           bear: 77.029%
      zebra: 74.445%        giraffe: 71.189%
        hat:    nan%       backpack: 21.465%
   umbrella: 63.845%           shoe:    nan%
eye glasses:    nan%        handbag: 12.160%
        tie: 23.986%       suitcase: 48.190%
    frisbee: 34.093%           skis: 11.729%
  snowboard: 24.124%    sports ball:  7.579%
       kite: 41.729%    baseball bat:  3.229%
baseball glove:  2.157%  skateboard: 11.165%
  surfboard: 39.573%    tennis racket:  9.564%
     bottle: 30.714%          plate:    nan%
 wine glass: 27.689%            cup: 29.457%
       fork: 10.022%          knife: 14.906%
      spoon:  8.765%           bowl: 33.181%
     banana: 67.220%          apple: 52.411%
   sandwich: 54.550%         orange: 67.917%
   broccoli: 58.257%         carrot: 51.784%
    hot dog: 59.786%          pizza: 68.727%
      donut: 60.234%           cake: 49.545%
      chair: 22.941%          couch: 41.213%
potted plant: 33.433%           bed: 53.363%
     mirror:    nan%    dining table: 24.115%
     window:    nan%           desk:    nan%
     toilet: 49.828%           door:    nan%
         tv: 49.309%         laptop: 41.132%
      mouse: 12.514%         remote: 27.475%
   keyboard: 39.447%     cell phone: 42.907%
  microwave: 38.491%           oven: 30.251%
    toaster: 17.178%           sink: 19.368%
refrigerator: 48.354%       blender:    nan%
       book: 36.281%          clock: 39.714%
       vase: 31.402%       scissors: 36.389%
 teddy bear: 63.098%     hair drier: 19.815%
 toothbrush: 28.875%
======================================================
       mIoU: 42.563%


FP = 35.34603837587358, FN = 22.09138763809436
Prediction = 55.79337973497383, Recall = 63.39359639675423

it works! thx

I will close this issue. If there are more questions, you are welcome to raise issues :)