'filenames' order
Opened this issue · 1 comments
Hi,
thanks for providing this code to read GESLA data!
I just wanted add a minor comment on line 89 in gesla.py.
idx = [s.Index for s in self.meta.itertuples() if s.filename in filenames]
It seems like if 'filenames' is not exactly sorted like the names in self.meta, the meta information will not have the same order as the concatenated xr.dataset. I assume the entries in meta are alphabetically sorted, so that could be done when reading in the list of 'filenames'.
Thanks!
Hey I encountered the same issue. For me it was not only a minor issue. Actually the filename reference and the corresponding data is mixed up, if you have the initial filenames
in an arbitrary order.
For me this lead to a wrong analysis in the percentiles of corresponding stations.
I adjusted the files_to_xarray
method in the class as follows to solve the issue. Now I can have arbirtary sorting in filenames
when loading the data.
def files_to_xarray(self, filenames):
"""Read a list of GESLA filenames into a xarray.Dataset object. The
dataset includes variables containing metadata for each record.
Args:
filenames (list): list of filename strings.
Returns:
xarray.Dataset: data, flags, and metadata for each record.
"""
def sort_filenames(filenames):
"""Auxillary function that sorts the filenames.
This ensures that data can be loaded independent of the sorting
given by the user input.
Args:
filenames (list): list of filename strings.
Returns:
list: list of sorted filenames.
Author:
Kai Bellinghausen
"""
# Get the indices of filenames in the metadata dataframe
indices = [self.meta[self.meta['filename'] == filename].index[0] for filename in filenames]
# Sort filenames based on the indices
sorted_filenames = [filename for _, filename in sorted(zip(indices, filenames))]
return sorted_filenames
filenames = sort_filenames(filenames)
data = xr.concat(
[
self.file_to_pandas(f, return_meta=False).to_xarray()
for f in filenames
],
dim="station",
)
idx = [
s.Index for s in self.meta.itertuples() if s.filename in filenames
]
meta = self.meta.loc[idx]
meta.index = range(meta.index.size)
meta.index.name = "station"
data = data.assign({c: meta[c] for c in meta.columns})
return data