`mu.read` ValueError

Question

`mu.read` ValueError

mdmanurung opened this issue 3 years ago · 2 comments

Dear author,

I encountered the following issue upon reading a mudata object that was converted from Seurat:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [6], in <module>
----> 1 mdata2 = mu.read("data/processed/mudata.h5mu/ADT")

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:409, in read(filename, **kwargs)
    406     return read_h5mu(filepath, **kwargs)
    407 elif m[3] == "":
    408     # .h5mu/<modality>
--> 409     return read_h5ad(filepath, m[2], **kwargs)
    410 elif m[2] == "mod":
    411     # .h5mu/mod/<modality>
    412     return read_h5ad(filepath, m[3], **kwargs)

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:372, in read_h5ad(filename, mod, backed)
    370 with h5py.File(filename, hdf5_mode) as f_root:
    371     f = f_root["mod"][mod]
--> 372     return _read_h5mu_mod(f, manager, backed)

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:320, in _read_h5mu_mod(g, manager, backed)
    318     elif k != "raw":
    319         d[k] = read_attribute(g[k])
--> 320 ad = AnnData(**d)
    321 if manager is not None:
    322     ad.file = AnnDataFileManager(ad, os.path.basename(g.name), manager)

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:308, in AnnData.__init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    306     self._init_as_view(X, oidx, vidx)
    307 else:
--> 308     self._init_as_actual(
    309         X=X,
    310         obs=obs,
    311         var=var,
    312         uns=uns,
    313         obsm=obsm,
    314         varm=varm,
    315         raw=raw,
    316         layers=layers,
    317         dtype=dtype,
    318         shape=shape,
    319         obsp=obsp,
    320         varp=varp,
    321         filename=filename,
    322         filemode=filemode,
    323     )

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:526, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    523 # Backwards compat for connectivities matrices in uns["neighbors"]
    524 _move_adj_mtx({"uns": self._uns, "obsp": self._obsp})
--> 526 self._check_dimensions()
    527 self._check_uniqueness()
    529 if self.filename:

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:1837, in AnnData._check_dimensions(self, key)
   1835     key = {key}
   1836 if "obs" in key and len(self._obs) != self._n_obs:
-> 1837     raise ValueError(
   1838         "Observations annot. `obs` must have number of rows of `X`"
   1839         f" ({self._n_obs}), but has {self._obs.shape[0]} rows."
   1840     )
   1841 if "var" in key and len(self._var) != self._n_vars:
   1842     raise ValueError(
   1843         "Variables annot. `var` must have number of columns of `X`"
   1844         f" ({self._n_vars}), but has {self._var.shape[0]} rows."
   1845     )

ValueError: Observations annot. `obs` must have number of rows of `X` (163), but has 62773 rows.

I then tried to load each modality one by one. I could load my RNA data, but not my ADT. My ADT data has 163 features in it. For both modalities, I have 62773 observations.

Considering that, I am a bit confused by the error. Why would obs of my ADT data expect 163 rows, which should be the number of features?

Thanks for taking the time.

Regards,
Mikhael

Answer 1 · 2022-02-24T11:02:52.000Z

Why would obs of my ADT data expect 163 rows, which should be the number of features?

This is due to how R vs Numpy store matrices. R stores them in column-major order, whereas Numpy by default stores them in row-major order. The internal memory representation is preserved when writing to HDF5, so matrices end up transposed, unless we transpose them ourselves prior to writing.

Please try this patch:

index 9565f12..7d30222 100644
--- a/R/WriteH5MU.R
+++ b/R/WriteH5MU.R
@@ -83,7 +83,7 @@ WriteH5ADHelper <- function(object, assay, root, global = FALSE) {
               write_sparse_matrix(data_group, x_data, sparse_type)
             } else {
               # dense matrix
-              raw_group$create_dataset("X", x_data)
+              raw_group$create_dataset("X", t(x_data))
             }
             # .raw has to contain .var as well
             raw_var_group <- raw_group$create_group("var")

If this doesn't work, a sample of a Seurat object that has this problem would help.

Answer 2 · 2022-02-24T12:34:07.000Z

I've tested the patch but I still see the same error. Strangely, the error disappears upon forcing my ADT matrix to a sparse matrix using Seurat::as.sparse. Now I can load my mudata file.