what's the differences between lincs_full.h5ad and lincs.h5ad?
huawen-poppy opened this issue · 3 comments
Hello. Thank you for the nice work.
Following the lincs.ipynb file, I generated the lincs_pp.h5ad file. But when I followed the lincs_SMILES.ipynb file, it used lincs_full_pp.h5ad as input. I am wondering what is the differences between lincs.h5ad and lincs_full.h5ad? When to use the full file?
Hi @huawen-poppy,
The preprocessing pipeline should be able to deal with both lincs_pp.ipynb
and lincs_full_pp.ipynb
. I must have renamed them for clarity as some point. When I check:
from chemCPA.paths import DATA_DIR
adata_path = DATA_DIR / "lincs_small_.h5ad"
adata_path_full = DATA_DIR / "lincs_complete.h5ad"
assert adata_path.exists()
assert adata_path_full.exists()
#%%
import scanpy as sc
adata_small = sc.read(adata_path)
adata_full = sc.read(adata_path_full)
# %%
print(adata_small)
print(adata_full)
I get the following result:
AnnData object with n_obs × n_vars = 199620 × 978
obs: 'cell_id', 'det_plate', 'det_well', 'lincs_phase', 'pert_dose', 'pert_dose_unit', 'pert_id', 'pert_iname', 'pert_mfc_id', 'pert_time', 'pert_time_unit', 'pert_type', 'rna_plate', 'rna_well', 'batch', 'condition', 'cell_type', 'dose_val', 'cov_drug_dose_name', 'control', 'split'
var: 'pr_gene_title', 'pr_is_lm', 'pr_is_bing'
uns: 'rank_genes_groups_cov'
AnnData object with n_obs × n_vars = 840677 × 977
obs: 'cell_id', 'det_plate', 'det_well', 'lincs_phase', 'pert_dose', 'pert_dose_unit', 'pert_id', 'pert_iname', 'pert_mfc_id', 'pert_time', 'pert_time_unit', 'pert_type', 'rna_plate', 'rna_well', 'condition', 'cell_type', 'dose_val', 'cov_drug_dose_name', 'control', 'split', 'canonical_smiles', 'split1', 'random_split', 'split_ood_drugs'
var: 'pr_gene_title', 'pr_is_lm', 'pr_is_bing', 'gene_id', 'in_sciplex'
uns: 'cydata_pull', 'rank_genes_groups_cov'
So it is just a matter of dataset size. The difference in gene numbers comes from the fact that I was not able to match one of the 978 genes with the sci=Plex-3 data.
I hope that clarifies this. Let me know if you encounter further issues!
Hello @MxMstrmn , Thank you for your kind explanation! It's clear to me!