Accessing and plotting QC metrics after merging to AnnDataSet: more documentation needed
jeremymsimon opened this issue · 1 comments
Hi @kaizhang -
This is mainly a request for more documentation, as I didn't see this explicitly mentioned anywhere (though perhaps I missed it) but is likely useful to others.
I've followed your various vignettes for processing and integrating multiple objects, eventually resulting in an AnnDataSet object. I discovered that the QC metrics calculated at the beginning of the pipeline (e.g. n_fragment
, frac_dup
, etc) are indeed retained after merging AnnData objects into the AnnDataSet like so:
>>> dataset.adatas
Stacked AnnData objects:
obs: 'n_fragment', 'frac_dup', 'frac_mito', 'tsse', 'doublet_probability', 'doublet_score'
obsm: 'fragment_paired'
>>> dataset.adatas.obs['n_fragment']
shape: (67_815,)
Series: 'n_fragment' [u64]
[
9487
19803
9662
11500
3783
…
11122
5495
5248
7645
3948
]
I can use this to plot a UMAP colored by these variables with:
snap.pl.umap(dataset, color=dataset.adatas.obs['n_fragment'], interactive=False)
snap.pl.umap(dataset, color=dataset.adatas.obs['tsse'], interactive=False)
# etc
However it's worth noting I can't directly specify color = 'n_fragment'
otherwise I get a RuntimeError: not found: n_fragment
error, since they are not in the dataset.obs
itself
Similarly though, I would also like to plot a violin plot, grouped by cluster, showing these variables. It seems though that scanpy.pl.violin
doesn't accept an AnnDataSet as input, so I'm wondering whether this is possible without converting the object back to AnnData format? Or perhaps you have plans of implementing a violin plot function of your own?
>>> sc.pl.violin(dataset, dataset.adatas.obs['tsse'], groupby='leiden_final')
AttributeError: 'builtins.AnnDataSet' object has no attribute '_sanitize'
Thanks!
Thanks for reporting. We'll add more documentation regarding this shortly.