pmlmodelling/nctoolkit

DataSet.spatial_sum returns 0 with NaN values

Closed this issue · 5 comments

Hello,

I am currently working with historical rainfall data for an island. Some data are missing, for exemple in the sea, only data for the territory is given. I wanted to perform a spatial_sum, but because of NaN data (sea data), the function returns zero for all timesteps.
Unfortunately, when I try DataSet.as_missing(0), it doesn't change anything, perhaps because my file has a lot of data.
Do you think it could be a good idea to deal with NaN values inside all DataSet.spatial_X functions ?

Can you add the code you used? @agnesfrancois

This sounds like ocean model output where land is coded as zero instead of missing values. The spatial sum should work regardless unless there is something strange in the file formatting. You’d need to fix the zeros if you want a spatial mean, but spatial sum should be ok

Here is my code :

import nc toolkit as nc
data = nc.open_data('data.nc')
data.spatial_sum(by_area=False)
data.plot()

Which gives me that (approximately same result with spatial_mean):
image

Because at each timestep sea values are changing a little bit, and at the dots we see on the graph, there is no NaN values (zeros for sea data). Otherwise, data look like that :
image

Thanks. Is it possible to share the data? Just a few time steps would do. The above code should work, but potentially something is wrong in the file’s metadata

Sorry, data are not mine and I can't share... I fixed the problem by keeping spatial data and exporting into csv, then working with spatial data as dataframes with Python.
If the function was supposed to work and the problem comes from my data, then I'll try to see where it can come, but the issue can be closed ! Thank you !

I will close the issue.

For cases like this you might want to run checks on the dataset

ds.check()

That might tell if there is a formatting problem with the data.