NRCan/geo-deep-learning

Dataset split.

Opened this issue · 1 comments

I notice percent in the split dataset do not match the expected proportions. It seems split is made before filtering patches by min_annot_perc.

More details:
When I do split by percent, I expect the %val + %trn to match the total of the patch. Instead, I have fewer tails than I expected in the validation set, or training set, or both. It seems that the split is made from the total of patches, and then each split is filtered to the %of_annotation. This logic leads to the deletion of some tiles and the mismatch of expected number of tiles per split.