Use your code with a new data set
JohnTaylor2000 opened this issue · 2 comments
JohnTaylor2000 commented
Could you run through the steps needed to use a different data set with your code please?
- input data format/layout
- can you use netcdf input files
- parameters that set the data file and data format characteristics
- how do you handle normalisation or standardisation of input data
- any other issues that need to be addressed when using a new data set
gaozhihan commented
Thanks for your interest. The most convenient way to use your own data is to implement a pl.LightningDataModule
and integrate it into your training script. I will take the training script on MovingMNIST as example to illustrate how to do it.
- Implement your own
pl.LightningDataModule
(example link). You may refer to its official doc for more detailed instructions. - Integrate your new
pl.LightningDataModule
into your training script (example link). - Modify the data layout in the
forward()
method of the trainingLightningModule
(example link).
I address your questions in the following:
- Make sure the input tensor into the Earthformer has layout
"NTHWC"
. - Sure, as long as you implement a PyTorch Dataset to load it.
- Please try to wrap all the details of loading the data into your
pl.LightningDataModule
. - Typically we rescale the values to [0, 1] (example link).
gaozhihan commented
Let me close the issue for now. Please feel free to reopen if you find any problems.