Set aside N tracks for "real-world" experiment
Closed this issue · 5 comments
While metrics from model training are insightful and helpful, it doesn't quite accurately portray how the model will perform in practice. We need to set aside N number of tracks (ground station, satellite combinations) for use in an experiment which simulates real-world use of the model.
Also plot these tracks and save the figures as part of this issue.
34 ground station, satellite combinations should be set aside in order to ensure the validation set is at least 20% of the data. However, with 3 satellites in the data (G07
, G08
, and G20
), it may make sense to ensure that the validation set contains an equal amount of each satellite. We could have 11 ground stations for each satellite in the validation set, resulting in 33 observations (or the validation set will be 19.64% if the original data. I think that's close enough for being able to balance the validation set.
Will right some code to randomly sample which ground station and satellite combinations we will keep, and then will manually set those aside in the data through some further reorganization of the directory structure. Once that's complete, I can return to #65 and update the readme accordingly.
Oh and I'll include the code in notebooks/data_validation.ipynb
on the feature/validate_data
branch.