PMBio/MuDataSeurat

can't read mudata created with muon (python)

bio-la opened this issue · 8 comments

Hello, thanks for working on interoperability between seurat and mudata!

I can't read a mudata that I created following your multimodal tutorial using ReadH5MU

test<-ReadH5MU("data_test.dir/pbmc_w3_teaseq.h5mu")
Error in dataset[[name]]$read() : attempt to apply non-function

I have no problems loading the object with muon in python

import muon as mu
mu.read_h5mu("data_test.dir/pbmc_w3_teaseq.h5mu")
MuData object with n_obs × n_vars = 5805 × 113187
  obs:  'sample', 'well', 'leiden_multiplex', 'leiden_mofa', 'leiden_wnn'
  var:  'highly_variable', 'gene_ids', 'feature_types', 'genome', 'interval'
  obsm: 'X_mofa', 'X_umap', 'X_wnn_umap'
  varm: 'LFs'
  obsp: 'mofa_connectivities', 'mofa_distances', 'wnn_connectivities', 'wnn_distances'
  3 modalities
    rna:        5805 x 16381
      obs:      'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden'
      var:      'gene_ids', 'feature_types', 'genome', 'interval', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
      uns:      'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'lognorm'
      obsp:     'connectivities', 'distances'
    atac:       5805 x 96760
      obs:      'n_fragments', 'n_duplicate', 'n_mito', 'n_unique', 'altius_count', 'altius_frac', 'gene_bodies_count', 'gene_bodies_frac', 'peaks_count', 'peaks_frac', 'tss_count', 'tss_frac', 'barcodes', 'cell_name', 'well_id', 'chip_id', 'batch_id', 'pbmc_sample_id', 'DoubletScore', 'DoubletEnrichment', 'TSSEnrichment', 'n_genes_by_counts', 'total_counts', 'n_counts', 'leiden'
      var:      'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
      uns:      'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'counts', 'lognorm'
      obsp:     'connectivities', 'distances'
    prot:       5805 x 46
      obs:      'total_counts'
      var:      'highly_variable'
      uns:      'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'counts'
      obsp:     'connectivities', 'distances'

I can explore the h5 but it breaks where the error says. it also seems to expect some attributes that I don't have in the mudata

h5 <- open_and_check_mudata("~/Documents/devel/data_test.dir/pbmc_w3_teaseq.h5mu")
metadata <- read_with_index(h5[["obs"]])
dataset = h5[['obs']]
dataset_attr <- tryCatch({
  h5attributes(dataset)
  }, error = function(e) {
  list("_index" = "_index")
  })
  indexcol <- "_index"
if ("_index" %in% names(dataset_attr)) {
  indexcol <- dataset_attr$`_index`
}
dataset_attr
columns <- names(dataset)
columns <- columns[columns != "__categories"]
columns

dataset[["sample"]]$read()

Error in dataset[[name]]$read() : attempt to apply non-function

values_attr <-h5attributes(dataset)
values_attr 
$`column-order`
[1] "sample" "well"  

$`_index`
[1] "_index"

$`encoding-type`
[1] "dataframe"

$`encoding-version`
[1] "0.2.0"

# so the following line will be NULL
# values_attr$categories

any suggestions?

thanks!

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bmcite.SeuratData_0.3.0 pbmc3k.SeuratData_3.1.4 SeuratData_0.2.2        hdf5r_1.3.5             MuDataSeurat_0.0.0.9000 magrittr_2.0.3          datapasta_3.1.0        
 [8] forcats_0.5.1           stringr_1.4.0           dplyr_1.0.8             purrr_0.3.4             readr_2.1.2             tidyr_1.2.0             tibble_3.1.6           
[15] ggplot2_3.3.5           tidyverse_1.3.1        

loaded via a namespace (and not attached):
  [1] readxl_1.4.0          backports_1.4.1       plyr_1.8.7            igraph_1.3.0          lazyeval_0.2.2        splines_4.1.2         listenv_0.8.0         scattermore_0.8      
  [9] digest_0.6.29         htmltools_0.5.2       fansi_1.0.3           tensor_1.5            cluster_2.1.3         ROCR_1.0-11           tzdb_0.3.0            remotes_2.4.2        
 [17] globals_0.14.0        modelr_0.1.8          matrixStats_0.62.0    spatstat.sparse_2.1-0 prettyunits_1.1.1     colorspace_2.0-3      rappdirs_0.3.3        rvest_1.0.2          
 [25] ggrepel_0.9.1         haven_2.4.3           callr_3.7.0           crayon_1.5.1          jsonlite_1.8.0        spatstat.data_2.1-4   survival_3.3-1        zoo_1.8-9            
 [33] glue_1.6.2            polyclip_1.10-0       gtable_0.3.0          leiden_0.3.9          clipr_0.8.0           pkgbuild_1.3.1        future.apply_1.8.1    abind_1.4-5          
 [41] scales_1.1.1          DBI_1.1.2             spatstat.random_2.2-0 miniUI_0.1.1.1        Rcpp_1.0.8.3          viridisLite_0.4.0     xtable_1.8-4          reticulate_1.24      
 [49] spatstat.core_2.4-2   bit_4.0.4             htmlwidgets_1.5.4     httr_1.4.2            anndata_0.7.5.3       RColorBrewer_1.1-3    ellipsis_0.3.2        Seurat_4.1.0         
 [57] ica_1.0-2             pkgconfig_2.0.3       uwot_0.1.11           dbplyr_2.1.1          deldir_1.0-6          utf8_1.2.2            tidyselect_1.1.2      rlang_1.0.2          
 [65] reshape2_1.4.4        later_1.3.0           munsell_0.5.0         cellranger_1.1.0      tools_4.1.2           cli_3.3.0             generics_0.1.2        broom_0.7.12         
 [73] ggridges_0.5.3        fastmap_1.1.0         goftest_1.2-3         processx_3.5.3        bit64_4.0.5           fs_1.5.2              fitdistrplus_1.1-8    RANN_2.6.1           
 [81] pbapply_1.5-0         future_1.24.0         nlme_3.1-157          mime_0.12             formatR_1.12          xml2_1.3.3            compiler_4.1.2        rstudioapi_0.13      
 [89] plotly_4.10.0         curl_4.3.2            png_0.1-7             spatstat.utils_2.3-0  reprex_2.0.1          stringi_1.7.6         ps_1.6.0              lattice_0.20-45      
 [97] Matrix_1.4-1          SeuratDisk_0.0.0.9019 vctrs_0.3.8           pillar_1.7.0          lifecycle_1.0.1       spatstat.geom_2.4-0   lmtest_0.9-40         RcppAnnoy_0.0.19     
[105] addinexamples_0.1.0   data.table_1.14.2     cowplot_1.1.1         irlba_2.3.5           httpuv_1.6.5          patchwork_1.1.1       R6_2.5.1              promises_1.2.0.1     
[113] KernSmooth_2.23-20    gridExtra_2.3         parallelly_1.31.0     codetools_0.2-18      MASS_7.3-56           assertthat_0.2.1      rprojroot_2.0.3       withr_2.5.0          
[121] SeuratObject_4.0.4    sctransform_0.3.3     mgcv_1.8-40           parallel_4.1.2        hms_1.1.1             grid_4.1.2            rpart_4.1.16          Rtsne_0.15           
[129] shiny_1.7.1           lubridate_1.8.0      
gtca commented

Hey @bio-la, thanks for the detailed report, I think the issue is just with anndata v0.8 breaking forward compatibility, i.e. AnnData files from versions >=0.8 cannot be read with packages that were targeting earlier versions of files.
This is of course something we're implementing next for MuDataMAE/MuDataSeurat packages but I don't have a solid timeline to share yet...

thanks for answering this! do you mean that mudataSeurat was built expecting objects created with anndata <0.8, and which version of muon/mudata ?
in the meantime do you think I should be able to create a version of the mudata by downgrading muon ? (I'm currently running latest muon )
thanks!
f

gtca commented

Sorry for being unclear. That should be a better way to phrase it: MuDataSeurat was built before AnnData v0.8 spec and hasn't been updated to handle new versions of AnnData files yet.
Basically, it is the same issue as using anndata <0.8 in Python to read new files (scverse/anndata#698, scverse/anndata#739).

I believe using mudata <0.2.0 (e.g. 0.1.2) together with anndata <0.8.0 (e.g. 0.7.8) should help with that!
muon relies on mudata for serialisation so its version shouldn't be of a problem here.

gtca commented

Hey @bio-la, I added new categorical values handling in the latest commits so you might be able to read files written by the latest anndata/mudata. This is not fully compliant with the rest of the new encoding spec but relying on rhdf5 to figure out all the other types seems to do the trick so far for the files I've tested it on.

I.e. this is a small fix to remedy the current issue but also the first step in the direction of this library becoming v0.8-compliant. 😃

tea <- ReadH5MU("muon-tutorials/data/teaseq/pbmc_w3_teaseq.h5mu")
tea
# An object of class Seurat 
# 113187 features across 5805 samples within 3 assays 
# Active assay: rna (16381 features, 2910 variable features)
#  2 other assays present: atac, prot
#  10 dimensional reductions calculated: MOFA, MOFA_UMAP, UMAP, WNN_UMAP, rnaPCA, rnaUMAP, atacPCA, atacUMAP, protPCA, protUMAP

You can also note that it respects mod-order attribute now and keeps the rna as the default assay.

Hi @gtca !

I was encountering the above issue as well. I installed the latest development version of MuDataSeurat and am now experiencing the following error:

library(MuDataSeurat)

## VIASH START
par <- list(
  input = "resources_test/pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu",
  output = "output.rds"
)
## VIASH END

obj <- ReadH5MU(par$input)
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': attempt to apply non-function
In addition: Warning messages:
1: In missing_on_read("/var", paste0("global variables metadata (",  :
  Missing on read: /var. Seurat does not support global variables metadata (genome, feature_types, gene_symbol).
2: In missing_on_read(paste0("some of mod/", modalityname, "/layers"),  :
  Missing on read: some of mod/rna/layers. Seurat does not support custom layers, unless labeled 'counts'.

Files which trigger this error:

Hi @gtca, I am still observing the issue @rcannood described. I was wondering if there were any updates regarding supporting objects created with anndata > 0.8? Thanks!

Thanks a lot, @ilia-kats 🙇