Some test set chips will overlap with training set chips with multi-resolution - process_wv.py

Question

Some test set chips will overlap with training set chips with multi-resolution - process_wv.py

Closed this issue 6 years ago · 1 comments

Hi there, I want to communicate a concern I have about the way process_wv.py creates training and test sets.

In __main__ of process_wv.py, the outer loop iterates through the multiple resolutions to use, followed by the random number generation to direct a particular chip to test/training set. It seems to me that by doing this, there will be some land areas that are both included in the training and test set, although at different resolution. Thus the trained model would have seen areas (at a different resolution) that are in the test set, whereas when we submit models for evaluation, the model is judged on unseen images.

Please clarify if this is intended. Thanks!

Answer 1 · 2018-04-07T00:55:42.000Z

Hi, that's correct; by commenting out the 'shuffle_images_and_boxes_classes' function on line 181 you can remove that randomness. The code will then direct chips based on the train test split percentage.
The images to be used for evaluation are unique from the images provided in the train set. The script 'process_wv.py' creates TFRecords test/train for your own model evaluation.

Edit: I actually made a typo in the code, so that the default behavior right now is to not use randomness. Will update shortly.
Edit 2: Updated.