nansencenter/sea_ice_type_cnn_training

data_builded should not be called before inference

akorosov opened this issue · 0 comments

It is quite disadvantageous that the data_builded script has to be called before inference. Then it will work only with the data from the ASIP (note P instead of D) dataset. However we will use it also for other data from Sentinel-1 and AMSR-2. Therefore the model should be applicable to any input with such data.

One way to apply a model without a builder script is to make a new generator which accepts an in-memory object as input, instead of a list of NPZ files. The Archive class already has the functionality to read everything into memory (and also to write NPZ files, which is relevant only for training dataset). Now a new DataGenerator should be developed to take Archive object as input.
Then Archive becomes not a proper class as it mixes operations on archive (multiple files) and as single dataset. So it should be split into two (e.g. Archive and Dataset) and then the new DataGenerator should take only a Dataset object.

Later (another issue), in order to adapt the generator to other input data, we will develop a class that can read Sentinel-1 and AMSR2 from two different files, collocate them on the same grid, create an object with the same interface as the Dataset above and use it either for building another training dataset or for inference.