PMBio/MuDataSeurat

I've tested the patch but I still see the same error. Strangely, the error disappears upon forcing my ADT matrix to a sparse matrix using `Seurat::as.sparse`. Now I can load my `mudata` file.

bio-la opened this issue · 5 comments

I've tested the patch but I still see the same error. Strangely, the error disappears upon forcing my ADT matrix to a sparse matrix using Seurat::as.sparse. Now I can load my mudata file.

Originally posted by @mdmanurung in #2 (comment)

I have the same issue on the bonemarrow data used to illustrate the package functionality

library(SeuratData)
InstallData("bmcite")
bm <- LoadData(ds = "bmcite")

library(MuDataSeurat)
WriteH5MU(bm, "bmcite.h5mu")


test<- ReadH5MU("bmcite.h5mu")

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘CD14’, ‘CD19’, ‘CD27’, ‘CD28’, ‘CD34’, ‘CD38’, ‘CD4’, ‘CD69’ 

also reading the same object in python fails (with or without using seurat::as.sparse on the ADT)

import muon as mu

mu.read_h5mu("bmcite.h5mu")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/mudata/_core/io.py", line 380, in read_h5mu
    ad = _read_h5mu_mod(gmods[m], manager, backed not in (None, False))
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/mudata/_core/io.py", line 513, in _read_h5mu_mod
    ad = AnnData(**d)
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/anndata/_core/anndata.py", line 291, in __init__
    self._init_as_actual(
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/anndata/_core/anndata.py", line 521, in _init_as_actual
    self._check_dimensions()
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/anndata/_core/anndata.py", line 1843, in _check_dimensions
    raise ValueError(
ValueError: Observations annot. `obs` must have number of rows of `X` (25), but has 30672 rows.
gtca commented

Thanks @bio-la, we'll see if we can easily account for more common sparse matrix types here so I'll keep this open then for now.

Thanks for the example!

sorry, just updated the comment without reading your answer first! it's failing on the dataset you use to illustrate the functionality of the package. thanks for looking into this!

gtca commented

I just quickly checked this as I was curious, just to keep you in the loop, @bio-la:

  1. LoadData() %>% WriteH5MU("bmcite.h5mu")mu.read("bmcite.h5mu") works for the latest version I have, which might be a bit ahead of the current main here so at least we have that fixed in the next version.

  2. For the duplicate 'row.names' are not allowed, we'll have to account for the files that have same feature names in different modalities. Current mudata/muon workflows are nudging users to reduce ambiguity by using unique feature names but there's an R-specific bug here that we can fix by accounting for such duplicates.

Thanks again for the reports!

thanks for the update @gtca !

LoadData() %>% WriteH5MU("bmcite.h5mu") → mu.read("bmcite.h5mu") works for the latest version I have

you mean mudata (python) or MuDataSeurat ?
do you have a timeframe in mind for this version to be available?

sorry to come across a bit pushy, i am trying to finish something for a paper and I want to figure the right workarounds to make mudata->seurat-> mudata work for the time being (ofc will fix at a later point to get the update that accounts for the newest anndata version)
thank you very much!

gtca commented

Hey @bio-la, that's understandable that you'd want that working!

Do you think you can check if the latest main branch works for you and let me know? That should fix both points 1 and 2 from my previous message here.