Unable to fit the netCDF4 file data using pyXpcm
Priyanshu-Malik opened this issue · 4 comments
Don't know if this is how the netCDF4 (sst2010.nc) file is read as can be seen here while dealing with pyXpcm.
the
"m.fit(ds, features=features_in_ds, dim=features_zdim)
m"
code gives the following error :
xarray.DataArray vertical axis is not deep enough for this PCM axis [0.51 > 0.00]
Can you take a look and tell me what I am missing here.
Hi @Priyanshu-Malik
the vertical axis must be negative and oriented downward
so, in your case you could:
ds['deptht'] = -np.abs(ds['deptht'])
then, to avoid interpolation (because this is a large array and it will take some time), you can use the dataset vertical axis as a PCM axis:
z = ds['deptht'].values[0:40]
pcm_features = {'temperature': z}
Here I cut the axis to the first 500m of the water column
and then you can:
features_in_ds = {'temperature': 'votemper'}
m.fit_predict(ds, features=features_in_ds, inplace=True)
on my laptop, this took about 37 minutes to run with a PCM of K=12 classes:
ds['PCM_LABELS'].isel(time_counter=0).plot(x='x')
Worked like a charm, Can't thank you enough and pyxpcm. Took half an hour to run in my laptop as well.
I was able to follow the rest of the tutorial with ease from that point onward. Got all plots and graphs
Though, the 'votemper' and 'PCM_LABELS' gave all values in 'nan nan'(attached below) after running the "m.fit_predict(..)" code which might have something to do with the dataset itself and not an error, really want to hear about it from you.
great !
these nans are just a sample of the large array
with this I think we can close this issue then
Hello Sir, been working on the same dataset for a week now, I have to do it for 5 years, so I downloaded the three year data first, making the total size of array about 5 GB. The m.fit_predict(ds, features=features_in_ds, inplace=True)
command now takes forever to work, along with a warning that says, Slicing is producing a large chunk
If possible, can you answer these important queries ?
- Is there a workaround for implementing the model for large size of data ? If no, just answer the 2nd question which is more important for me.
- How to determine the optimum number of classes (k value), because the tutorial don't have any way to find the value of
k using BIC elbow method. We just took random value, like k=12. Can you please tell me how to find the k value suitable for my dataset using pyXpcm?