slowkow/harmonypy

Harmonypy results with anndata

Closed this issue · 4 comments

Once the adjusted PCs are calculated...

data_mat = adata.obsm['X_pca']
meta_data = adata.obs
vars_use = ['batch']
ho = hm.run_harmony(data_mat, meta_data, vars_use)

...how do I integrate the results back into the anndata object (adata) to proceed with the workflow? I am attempting to use harmonypy in the scanpy workflow but am very new at this.

Thanks in advance!

Have you tried replacing the original PCs with the harmonized PCs?

adjusted_pcs = pd.DataFrame(ho.Z_corr)
adata.obsm['X_pca'] = adjusted_pcs

You might also be able to add a new entry in the obsm slot:

adata.obsm['X_pca_harmonized'] = adjusted_pcs

I did not test these snippets, so I don't know if they will work or not.

Thanks for your quick response!

In both cases I get a length error:

adjusted_pcs = pd.DataFrame(ho.Z_corr)
adata.obsm['X_pca'] = adjusted_pcs
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-93-d54e748e808d> in <module>
      1 adjusted_pcs = pd.DataFrame(ho.Z_corr)
----> 2 adata.obsm['X_pca'] = adjusted_pcs

~/anaconda3/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in __setitem__(self, key, value)
    148 
    149     def __setitem__(self, key: str, value: V):
--> 150         value = self._validate_value(value, key)
    151         self._data[key] = value
    152 

~/anaconda3/lib/python3.7/site-packages/anndata/_core/aligned_mapping.py in _validate_value(self, val, key)
    206             hasattr(val, "index")
    207             and isinstance(val.index, cabc.Collection)
--> 208             and not (val.index == self.dim_names).all()
    209         ):
    210             # Could probably also re-order index if it’s contained

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in cmp_method(self, other)
    103         if isinstance(other, (np.ndarray, Index, ABCSeries, ExtensionArray)):
    104             if other.ndim > 0 and len(self) != len(other):
--> 105                 raise ValueError("Lengths must match to compare")
    106 
    107         if is_object_dtype(self) and isinstance(other, ABCCategorical):

ValueError: Lengths must match to compare
---------------------------------------------------------------------------

Same thing also happens if I try to use

adata.obsm['X_pca'] = adjusted_pcs.values
ValueError: Value passed for key 'X_pca' is of incorrect shape. Values of obsm must match dimensions (0,) of parent. Value had shape (50, 29552) while it should have had (29552,).

Thanks for your time.

I think a simple transpose fixed the issue:

adjusted_pcs = pd.DataFrame(ho.Z_corr).T
adata.obsm['X_pca']=adjusted_pcs.values

Thanks!

Thanks for sharing the results. Glad you got it working.