jackroos/VL-BERT

为何只训练coco2014的一张图片时出错,但训练全部的图片不报错?

nyj-ocean opened this issue · 0 comments

你好,十分感谢您出色的作品!

我在训练coco2014的全部图片时是没有出错的,用的是下面的命令:
./scripts/dist_run_single.sh 1 refcoco/train_end2end.py cfgs/refcoco/base_detected_regions_4x16G.yaml output
可以正常训练,输出如下

Rank[ 0]Epoch[0] Batch [0] Speed: - samples/sec ETA: - d - h - m Train-RefAcc=0.000000, ClsAcc=0.800000, ClsPosAcc=1.000000, ClsPosFrac=0.400000, ClsLoss=0.805409,
Rank[ 0]Epoch[0] Batch [100] Speed: 4.54 samples/s ETA: 6 d 3 h 8 m Data: 0.004 Tran: 0.000 F: 0.064 B: 0.090 O: 0.042 M: 0.026 Train-RefAcc=0.138614, ClsAcc=0.732431, ClsPosAcc=0.242991, ClsPosFrac=0.111400, ClsLoss=0.644949

但当我只训练coco2014里的一张图片时,也就是instances_train2014.jsoninstances_val2014.json里分别只有一张图片时,训练会报错,如下
File "/home/gc/4-images/AB/hanjia/new-bert/new/VL-BERT-master/refcoco/../refcoco/data/datasets/refer/refer.py", line 118, in createIndex refToAnn[ref_id] = Anns[ann_id] KeyError: 1719310

请问该如何解决?