czbiohub-sf/tabula-muris-senis

Do I need to normalize the Processed files (to use with scanpy) download from Figshare?

genecell opened this issue · 4 comments

Hi,

I am wondering whether should I need to normalize the Processed files (.h5ad) downloaded from FigShare. I did not find related illustrations for this, and I compared the downstream results between doing normalization and not doing normalization. I think I need the normalization for the downloaded files, but I am not sure. Thank you!

For normalization, I mean:
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

Best regards,
MD

Hi @genecell, the .h5ad available from figshare are already normalized. Let me know if I can help you with anything else!

Got it! @aopisco Thank you for your rapid reply!

I still have some questions:

The data I downloaded from figshare were:
tabula-muris-senis-facs-processed-official-annotations.h5ad
tabula-muris-senis-droplet-processed-official-annotations.h5ad. I extracted tissues of interest from these two .h5ad files.

  1. I checked Extended Data Fig. 4b (analysis workflow) in the nature paper of Tabula Muris Senis. There is sc.pp.scale(tiss, max_value=10) processing, and was this applied to the above two .h5ad files during analysis?

  2. I identified highly variable genes and visualized their distributions via sc.pl.highly_variable_genes(adata). I think when I normalized the processed data from figshare, the shape of plots seem more reasonable (not sure).

Without normalization again:

image

With normalization agian:
image

  1. If I want to use the raw data, can I use tabula-muris-senis-droplet-official-raw-obj.h5ad and tabula-muris-senis-facs-official-raw-obj.h5ad available from figshare?

  2. In sc.tl.pca(), will you recommend to set use_highly_variable=True and zero_center=True?

Thank you!

hi @genecell,

  1. I checked Extended Data Fig. 4b (analysis workflow) in the nature paper of Tabula Muris Senis. There is sc.pp.scale(tiss, max_value=10) processing, and was this applied to the above two .h5ad files during analysis?
    Yes, that was always the case
  1. If I want to use the raw data, can I use tabula-muris-senis-droplet-official-raw-obj.h5ad and tabula-muris-senis-facs-official-raw-obj.h5ad available from figshare?
    unless something is missing adata.raw.X contains the raw that
  2. In sc.tl.pca(), will you recommend to set use_highly_variable=True and zero_center=True?
    I usually to use_highly_variable=True and zero_center=False

Hi @aopisco, thank you! Now I have been working well on the raw data with your help. Thanks a lot!