trainval set size is larger than combining train + val sets
Closed this issue · 4 comments
For Semantic Segmentation Task:
the training
set size is 3256, the val
set is 3329. The trainval
set size is 10103
The API is somehow mess up.
data_folder = os.path.join(data_folder, 'VOCdevkit/')
annFile = os.path.join(data_folder, 'trainval_merged.json')
imgDir = os.path.join(data_folder, 'VOC2010/JPEGImages')
phase = 'trainval'
detail = Detail(annFile, imgDir, phase)
print(len(detail.getImgs())
Huh. Where are you getting the numbers 3256 and 3329 from? 10103 is the correct trainval size.
I just tried passing "training" as the phase and building the index seems to take quite a while...I think it's a bug. Taking a look.
phase = 'train' gives me 3256, phase = 'val' gives 3329.
using 'trainval' there are some zero masks in the dataset, which means it is messed up with test set
Just pushed an update - fetch with git pull
, then make && make install
in the PythonAPI directory.
Here are the correct numbers:
len(details.getImgs(phase="trainval")) == 10100
len(details.getImgs(phase="train")) == 4996
len(details.getImgs(phase="val")) == 5104
len(details.getImgs(phase="test")) == 4188
The default (no parameters) behavior of getImgs()
is to assume that phase
is the same as the phase
specified in the Detail constructor.
Going to close the issue for now, but please let us know if any more problems arise!
Thanks Matt! It is working.
Just let other teams know that please download the json file again using the command python3 ./download.py trainval_merged .
, because there are some updates of this file as well.