trainval set size is larger than combining train + val sets

Question

trainval set size is larger than combining train + val sets

Closed this issue 7 years ago · 4 comments

zhanghang1989 commented 7 years ago

For Semantic Segmentation Task:
the training set size is 3256, the val set is 3329. The trainval set size is 10103

The API is somehow mess up.

data_folder = os.path.join(data_folder, 'VOCdevkit/')
annFile = os.path.join(data_folder, 'trainval_merged.json')
imgDir = os.path.join(data_folder, 'VOC2010/JPEGImages')
phase = 'trainval'
detail = Detail(annFile, imgDir, phase)
print(len(detail.getImgs())

Answer 1 · 2017-07-18T23:11:36.000Z

Huh. Where are you getting the numbers 3256 and 3329 from? 10103 is the correct trainval size.

I just tried passing "training" as the phase and building the index seems to take quite a while...I think it's a bug. Taking a look.

Answer 2 · 2017-07-18T23:32:24.000Z

phase = 'train' gives me 3256, phase = 'val' gives 3329.

using 'trainval' there are some zero masks in the dataset, which means it is messed up with test set

Answer 3 · 2017-07-19T03:03:30.000Z

Just pushed an update - fetch with git pull, then make && make install in the PythonAPI directory.

Here are the correct numbers:

len(details.getImgs(phase="trainval")) == 10100
len(details.getImgs(phase="train"))    == 4996
len(details.getImgs(phase="val"))      == 5104
len(details.getImgs(phase="test"))     == 4188

The default (no parameters) behavior of getImgs()is to assume that phase is the same as the phase specified in the Detail constructor.

Going to close the issue for now, but please let us know if any more problems arise!

Answer 4 · 2017-07-19T05:29:21.000Z

Thanks Matt! It is working.

Just let other teams know that please download the json file again using the command python3 ./download.py trainval_merged . , because there are some updates of this file as well.