Some questions about data preprocessing.
Opened this issue · 1 comments
Your work is very interesting, and I would like to use portal-sc to conduct some tests on our dataset. And it's great to see the work you've done in preprocess_memory_efficient
.But I've noticed that the preprocessing order seems to differ from the standard workflow in Scanpy. I was wondering if there's a specific reason for this difference?
Hi there,
Thank you for your interest in our Portal method! In Portal, we select highly variable genes with flavor 'seurat_v3'. Count data is expected when using flavor 'seurat_v3', while logarithmized data is expected when using other flavors (https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html). Therefore, Portal selects genes before obtaining logarithmized data; while standard scanpy pipeline selects genes with another flavor using logarithmized data.
Best,
Jia