`mu.read` ValueError
mdmanurung opened this issue · 2 comments
Dear author,
I encountered the following issue upon reading a mudata
object that was converted from Seurat
:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [6], in <module>
----> 1 mdata2 = mu.read("data/processed/mudata.h5mu/ADT")
File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:409, in read(filename, **kwargs)
406 return read_h5mu(filepath, **kwargs)
407 elif m[3] == "":
408 # .h5mu/<modality>
--> 409 return read_h5ad(filepath, m[2], **kwargs)
410 elif m[2] == "mod":
411 # .h5mu/mod/<modality>
412 return read_h5ad(filepath, m[3], **kwargs)
File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:372, in read_h5ad(filename, mod, backed)
370 with h5py.File(filename, hdf5_mode) as f_root:
371 f = f_root["mod"][mod]
--> 372 return _read_h5mu_mod(f, manager, backed)
File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:320, in _read_h5mu_mod(g, manager, backed)
318 elif k != "raw":
319 d[k] = read_attribute(g[k])
--> 320 ad = AnnData(**d)
321 if manager is not None:
322 ad.file = AnnDataFileManager(ad, os.path.basename(g.name), manager)
File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:308, in AnnData.__init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
306 self._init_as_view(X, oidx, vidx)
307 else:
--> 308 self._init_as_actual(
309 X=X,
310 obs=obs,
311 var=var,
312 uns=uns,
313 obsm=obsm,
314 varm=varm,
315 raw=raw,
316 layers=layers,
317 dtype=dtype,
318 shape=shape,
319 obsp=obsp,
320 varp=varp,
321 filename=filename,
322 filemode=filemode,
323 )
File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:526, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
523 # Backwards compat for connectivities matrices in uns["neighbors"]
524 _move_adj_mtx({"uns": self._uns, "obsp": self._obsp})
--> 526 self._check_dimensions()
527 self._check_uniqueness()
529 if self.filename:
File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:1837, in AnnData._check_dimensions(self, key)
1835 key = {key}
1836 if "obs" in key and len(self._obs) != self._n_obs:
-> 1837 raise ValueError(
1838 "Observations annot. `obs` must have number of rows of `X`"
1839 f" ({self._n_obs}), but has {self._obs.shape[0]} rows."
1840 )
1841 if "var" in key and len(self._var) != self._n_vars:
1842 raise ValueError(
1843 "Variables annot. `var` must have number of columns of `X`"
1844 f" ({self._n_vars}), but has {self._var.shape[0]} rows."
1845 )
ValueError: Observations annot. `obs` must have number of rows of `X` (163), but has 62773 rows.
I then tried to load each modality one by one. I could load my RNA data, but not my ADT. My ADT data has 163 features in it. For both modalities, I have 62773 observations.
Considering that, I am a bit confused by the error. Why would obs
of my ADT data expect 163 rows, which should be the number of features?
Thanks for taking the time.
Regards,
Mikhael
Why would obs of my ADT data expect 163 rows, which should be the number of features?
This is due to how R vs Numpy store matrices. R stores them in column-major order, whereas Numpy by default stores them in row-major order. The internal memory representation is preserved when writing to HDF5, so matrices end up transposed, unless we transpose them ourselves prior to writing.
Please try this patch:
index 9565f12..7d30222 100644
--- a/R/WriteH5MU.R
+++ b/R/WriteH5MU.R
@@ -83,7 +83,7 @@ WriteH5ADHelper <- function(object, assay, root, global = FALSE) {
write_sparse_matrix(data_group, x_data, sparse_type)
} else {
# dense matrix
- raw_group$create_dataset("X", x_data)
+ raw_group$create_dataset("X", t(x_data))
}
# .raw has to contain .var as well
raw_var_group <- raw_group$create_group("var")
If this doesn't work, a sample of a Seurat object that has this problem would help.
I've tested the patch but I still see the same error. Strangely, the error disappears upon forcing my ADT matrix to a sparse matrix using Seurat::as.sparse
. Now I can load my mudata
file.