ccvl/detail-api

trainval set size is larger than combining train + val sets

Closed this issue · 4 comments

For Semantic Segmentation Task:
the training set size is 3256, the val set is 3329. The trainval set size is 10103

The API is somehow mess up.

data_folder = os.path.join(data_folder, 'VOCdevkit/')
annFile = os.path.join(data_folder, 'trainval_merged.json')
imgDir = os.path.join(data_folder, 'VOC2010/JPEGImages')
phase = 'trainval'
detail = Detail(annFile, imgDir, phase)
print(len(detail.getImgs())

Huh. Where are you getting the numbers 3256 and 3329 from? 10103 is the correct trainval size.

I just tried passing "training" as the phase and building the index seems to take quite a while...I think it's a bug. Taking a look.

phase = 'train' gives me 3256, phase = 'val' gives 3329.

using 'trainval' there are some zero masks in the dataset, which means it is messed up with test set

Just pushed an update - fetch with git pull, then make && make install in the PythonAPI directory.

Here are the correct numbers:

len(details.getImgs(phase="trainval")) == 10100
len(details.getImgs(phase="train"))    == 4996
len(details.getImgs(phase="val"))      == 5104
len(details.getImgs(phase="test"))     == 4188

The default (no parameters) behavior of getImgs()is to assume that phase is the same as the phase specified in the Detail constructor.

Going to close the issue for now, but please let us know if any more problems arise!

Thanks Matt! It is working.

Just let other teams know that please download the json file again using the command python3 ./download.py trainval_merged . , because there are some updates of this file as well.