amazon-science/earth-forecasting-transformer

Use your code with a new data set

JohnTaylor2000 opened this issue · 2 comments

Could you run through the steps needed to use a different data set with your code please?

  • input data format/layout
  • can you use netcdf input files
  • parameters that set the data file and data format characteristics
  • how do you handle normalisation or standardisation of input data
  • any other issues that need to be addressed when using a new data set

Thanks for your interest. The most convenient way to use your own data is to implement a pl.LightningDataModule and integrate it into your training script. I will take the training script on MovingMNIST as example to illustrate how to do it.

  • Implement your own pl.LightningDataModule (example link). You may refer to its official doc for more detailed instructions.
  • Integrate your new pl.LightningDataModule into your training script (example link).
  • Modify the data layout in the forward() method of the training LightningModule (example link).

I address your questions in the following:

  • Make sure the input tensor into the Earthformer has layout "NTHWC".
  • Sure, as long as you implement a PyTorch Dataset to load it.
  • Please try to wrap all the details of loading the data into your pl.LightningDataModule.
  • Typically we rescale the values to [0, 1] (example link).

Let me close the issue for now. Please feel free to reopen if you find any problems.