NVlabs/FourCastNet

Help editing code to run train.py on smaller h5 files

99snowleopards opened this issue · 0 comments

thank you for releasing this amazing repo!

I found the reason for the error - it has do with the number of in_channels specified in the AFNO.yaml file

I changed in_channels to [0, 1, 2] and I don't get the error now

I'm closing this issue, but it'd be great if someone could give me some insight or add a short note on the changes to be made to run train the model on smaller h5 files

thank you again for releasing the repo - I look forward to understanding the code better :)

==================

Original issue

I'm able to run train.py with the large h5 files available on Globus.

When I try to run train.py with the smaller h5 files (regional or era5_subsample) made available on the NERSC portal, the following line throws an error:

https://github.com/NVlabs/FourCastNet/blob/master/utils/data_loader_multifiles.py#L207

specifically:

self.files[year_idx][(local_idx-self.dt*self.n_history):(local_idx+1):self.dt, self.in_channels] throws the following error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/user/anaconda3/envs/climax/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 710, in __getitem__
    return self._fast_reader.read(args)
  File "h5py/_selector.pyx", line 351, in h5py._selector.Reader.read
  File "h5py/_selector.pyx", line 198, in h5py._selector.Selector.apply_args
IndexError: Fancy indexing out of range for (0-2)

the shape of self.files[year_idx] for the larger h5 files in Globus is

HDF5 dataset: shape (1460, 21, 721, 1440)

the shape of the self.files[year_idx] for the smaller h5 files on NERSC - regional or era5_subsample is

HDF5 dataset: shape (1460, 3, 360, 360)

I'm not very familiar with h5py yet - could someone please help me edit the code on https://github.com/NVlabs/FourCastNet/blob/master/utils/data_loader_multifiles.py#L207 so I can run train.py on the smaller h5 files .

is there some edit to the AFNO.yaml besides the file paths that needs to be made to train the model on smaller h5 files?

thank you! @jdppthk @MortezaMardani