kaizhang/SnapATAC2

Accessing and plotting QC metrics after merging to AnnDataSet: more documentation needed

jeremymsimon opened this issue · 1 comments

Hi @kaizhang -
This is mainly a request for more documentation, as I didn't see this explicitly mentioned anywhere (though perhaps I missed it) but is likely useful to others.

I've followed your various vignettes for processing and integrating multiple objects, eventually resulting in an AnnDataSet object. I discovered that the QC metrics calculated at the beginning of the pipeline (e.g. n_fragment, frac_dup, etc) are indeed retained after merging AnnData objects into the AnnDataSet like so:

>>> dataset.adatas
Stacked AnnData objects:
    obs: 'n_fragment', 'frac_dup', 'frac_mito', 'tsse', 'doublet_probability', 'doublet_score'
    obsm: 'fragment_paired'

>>> dataset.adatas.obs['n_fragment']
shape: (67_815,)
Series: 'n_fragment' [u64]
[
	9487
	19803
	9662
	11500
	3783
	…
	11122
	5495
	5248
	7645
	3948
]

I can use this to plot a UMAP colored by these variables with:

snap.pl.umap(dataset, color=dataset.adatas.obs['n_fragment'], interactive=False)
snap.pl.umap(dataset, color=dataset.adatas.obs['tsse'], interactive=False)
# etc

However it's worth noting I can't directly specify color = 'n_fragment' otherwise I get a RuntimeError: not found: n_fragment error, since they are not in the dataset.obs itself

Similarly though, I would also like to plot a violin plot, grouped by cluster, showing these variables. It seems though that scanpy.pl.violin doesn't accept an AnnDataSet as input, so I'm wondering whether this is possible without converting the object back to AnnData format? Or perhaps you have plans of implementing a violin plot function of your own?

>>> sc.pl.violin(dataset, dataset.adatas.obs['tsse'], groupby='leiden_final')
AttributeError: 'builtins.AnnDataSet' object has no attribute '_sanitize'

Thanks!

Thanks for reporting. We'll add more documentation regarding this shortly.