YangLabHKUST/Portal

Some questions about data preprocessing.

Opened this issue · 1 comments

Your work is very interesting, and I would like to use portal-sc to conduct some tests on our dataset. And it's great to see the work you've done in preprocess_memory_efficient.But I've noticed that the preprocessing order seems to differ from the standard workflow in Scanpy. I was wondering if there's a specific reason for this difference?
image
image

Hi there,

Thank you for your interest in our Portal method! In Portal, we select highly variable genes with flavor 'seurat_v3'. Count data is expected when using flavor 'seurat_v3', while logarithmized data is expected when using other flavors (https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html). Therefore, Portal selects genes before obtaining logarithmized data; while standard scanpy pipeline selects genes with another flavor using logarithmized data.

Best,
Jia