Use your code with a new data set

Question

Use your code with a new data set

JohnTaylor2000 opened this issue 2 years ago · 2 comments

JohnTaylor2000 commented 2 years ago

Could you run through the steps needed to use a different data set with your code please?

input data format/layout
can you use netcdf input files
parameters that set the data file and data format characteristics
how do you handle normalisation or standardisation of input data
any other issues that need to be addressed when using a new data set

Answer 1 · 2023-03-29T09:47:02.000Z

Thanks for your interest. The most convenient way to use your own data is to implement a pl.LightningDataModule and integrate it into your training script. I will take the training script on MovingMNIST as example to illustrate how to do it.

Implement your own pl.LightningDataModule (example link). You may refer to its official doc for more detailed instructions.
Integrate your new pl.LightningDataModule into your training script (example link).
Modify the data layout in the forward() method of the training LightningModule (example link).

I address your questions in the following:

Make sure the input tensor into the Earthformer has layout "NTHWC".
Sure, as long as you implement a PyTorch Dataset to load it.
Please try to wrap all the details of loading the data into your pl.LightningDataModule.
Typically we rescale the values to [0, 1] (example link).

Answer 2 · 2023-05-25T15:20:28.000Z

Let me close the issue for now. Please feel free to reopen if you find any problems.