obidam/pyxpcm

Add method for outputting PCA fields

Opened this issue · 5 comments

sdat2 commented

It would be interesting to be able to see the PCA fields after preprocessing, to see what space the clusters are actually fitting to.

If the PCA fields were attached to the dataset, it should be possible to add the conditional logic to prevent another preprocessing run on the same dataset for predict_proba().

gmaze commented

Hi @sdat2
It is possible to see the PCA fields with m.plot.reducer().
Once the PCM is fitted on a dataset, the PCA reducer is not longer fitted when a prediction is made.
Do you suggest that we should add the reduced data to the dataset ?

sdat2 commented

Hi @gmaze

Ah cool, I had not spotted that feature. I guess adding it to the dataset isn't necessary as such, and would probably just complicate the repository. Here is my implementation of doing outputing PCA:

def add_pca_to_xarray(self, ds, features=None,
                          dim=None, action='fit',
                          mask=None, inplace=False):
        """
        A function to preprocess the fields, fit the pca,
        and output the pca coefficients to an xarray dataarray object.

        :param ds: :class:`xarray.Dataset` to process
        :param features: dictionary
        :param dim: string for dimension along which the model is fitted (e.g. Z)
        :param action: string to be forwarded to preprocessing function
        :param mask: mask over dataset
        :param inplace: whether to add the dataarray to the existing dataset,
               or just to return the datarray on its own.

        """
        with self._context('fit', self._context_args):
            X, sampling_dims = self.preprocessing(ds, features=features, dim=dim,
                                                  action=action, mask=mask)
            pca_values = X.values
            n_features = str(X.coords['n_features'].values)

        with self._context('add_pca.xarray', self._context_args):
            P = list()
            for k in range(np.shape(pca_values)[1]):
                X = pca_values[:, k]
                x = self.unravel(ds, sampling_dims, X)
                P.append(x)

            da = xr.concat(P, dim='pca').rename('PCA_VALUES')
            da.attrs['long_name'] = 'PCA Values'
            da.attrs['n_features'] = n_features

        # Add posteriors to the dataset:
        if inplace:
            return ds.pyxpcm.add(da)
        else:
            return da ```
gmaze commented

nice ! just throw a PR and ask for a review, I'll check this out !

sdat2 commented

Thanks ! I've just realised that I've been working on master but I'll create a PR once I've sorted it out.

sdat2 commented

#28 PR created.