Unable to fit the netCDF4 file data using pyXpcm

Question

Unable to fit the netCDF4 file data using pyXpcm

Priyanshu-Malik opened this issue 2 years ago · 4 comments

Priyanshu-Malik commented 2 years ago

Don't know if this is how the netCDF4 (sst2010.nc) file is read as can be seen here while dealing with pyXpcm.

the
"m.fit(ds, features=features_in_ds, dim=features_zdim)
m"
code gives the following error :

xarray.DataArray vertical axis is not deep enough for this PCM axis [0.51 > 0.00]

Can you take a look and tell me what I am missing here.

Attaching the code here :

Answer 1 · 2022-07-05T13:47:50.000Z

Hi @Priyanshu-Malik
the vertical axis must be negative and oriented downward
so, in your case you could:

ds['deptht'] = -np.abs(ds['deptht'])

then, to avoid interpolation (because this is a large array and it will take some time), you can use the dataset vertical axis as a PCM axis:

z = ds['deptht'].values[0:40]
pcm_features = {'temperature': z}

Here I cut the axis to the first 500m of the water column

and then you can:

features_in_ds = {'temperature': 'votemper'}
m.fit_predict(ds, features=features_in_ds, inplace=True)

on my laptop, this took about 37 minutes to run with a PCM of K=12 classes:

ds['PCM_LABELS'].isel(time_counter=0).plot(x='x')

Answer 2 · 2022-07-05T19:03:37.000Z

Worked like a charm, Can't thank you enough and pyxpcm. Took half an hour to run in my laptop as well.
I was able to follow the rest of the tutorial with ease from that point onward. Got all plots and graphs

Though, the 'votemper' and 'PCM_LABELS' gave all values in 'nan nan'(attached below) after running the "m.fit_predict(..)" code which might have something to do with the dataset itself and not an error, really want to hear about it from you.

One of the plot :

Answer 3 · 2022-07-06T06:53:33.000Z

great !
these nans are just a sample of the large array
with this I think we can close this issue then

Answer 4 · 2022-07-15T07:33:11.000Z

Hello Sir, been working on the same dataset for a week now, I have to do it for 5 years, so I downloaded the three year data first, making the total size of array about 5 GB. The m.fit_predict(ds, features=features_in_ds, inplace=True) command now takes forever to work, along with a warning that says, Slicing is producing a large chunk

If possible, can you answer these important queries ?

Is there a workaround for implementing the model for large size of data ? If no, just answer the 2nd question which is more important for me.
How to determine the optimum number of classes (k value), because the tutorial don't have any way to find the value of
k using BIC elbow method. We just took random value, like k=12. Can you please tell me how to find the k value suitable for my dataset using pyXpcm?