String categories written by MuDataSeurat are read in as bytes by anndata
ivirshup opened this issue · 0 comments
ivirshup commented
Using the same setup in #5, with the fix that closed it:
suppressWarnings(SeuratData::InstallData("pbmc3k", force.reinstall = F))
suppressWarnings(data("pbmc3k"))
seuratObj <- suppressWarnings(pbmc3k)
WriteH5AD(seuratObj, "mudata_seurat.h5ad")
import anndata as ad
a = ad.read_h5ad("./mudata_seurat.h5ad")
a.obs
orig.ident nCount_RNA nFeature_RNA seurat_annotations
AAACATACAACCAC b'pbmc3k' 2419.0 779 b'Memory CD4 T'
AAACATTGAGCTAC b'pbmc3k' 4903.0 1352 b'B'
AAACATTGATCAGC b'pbmc3k' 3147.0 1129 b'Memory CD4 T'
AAACCGTGCTTCCG b'pbmc3k' 2639.0 960 b'CD14+ Mono'
AAACCGTGTATGCG b'pbmc3k' 980.0 521 b'NK'
... ... ... ... ...
TTTCGAACTCTCAT b'pbmc3k' 3459.0 1153 b'CD14+ Mono'
TTTCTACTGAGGCA b'pbmc3k' 3443.0 1224 b'B'
TTTCTACTTCCTCG b'pbmc3k' 1684.0 622 b'B'
TTTGCATGAGAGGC b'pbmc3k' 1022.0 452 b'B'
TTTGCATGCCTCAC b'pbmc3k' 1984.0 723 b'Naive CD4 T'
The categorical should be read in as strings. I would also suggest just writing the more recent dataframe and categorical format where everything is more self contained and annotated while you're at it.